TermFactory slides
Slides
Title
工厂

TermFactory Manual

© Lauri Carlson 2007-2012

TermFactory
Helsinki, Jan 2, 2012

Introduction

This document describes the TermFactory approach and architecture. With suitable styling, it serves as a white paper, a status report, a history, and a manual. Of the styles listed above, white is for those who just want to read about TermFactory, black is for TF users, small is for administrators, and all is for developers. Style 'talk' shows selected parts of the document as a slideshow.

If this is the last sentence of its paragraph, you are seeing the white paper. If this sentence is visible, user guide material is included. Finer detail is in small print. Text in grey is background - not invented here. Text in blue is blueprint - what was in the plans. Text in green is green paper content - what might be nice to have. Text in red is deprecated.

About

What TF TermFactory is all about

TermFactory is an architecture and a workflow for Semantic Web based, multilingual, collaborative terminology work. What this means in practice is that it applies Semantic Web and other document and language technology standards to the representation of multilingual special language terms and the related concepts, and provides a plan for how such terminologies can be collected, updated, and agreed about by professionals, not only terminology professionals, all over the globe, during their everyday work on virtual work platforms over the web. As a whole, TF could be termed a semantic web framework for multilingual terminology work.

TF provides

  • ontology and terminology formats
  • format conversions
  • query and edit tools
  • repositories
  • web services

for people to work on terms jointly or separately, building on the results of the work of others, while maintaining quality and consistency between the different contributions.

There is not much that TermFactory invents out of whole cloth. yet the totality is novel, in that there is no terminology management solution in existence that comes even close to doing what TF promises to do. Being a combination of existing and tried technologies, TF is not science fiction. Yet it is complex enough a totality that it needs careful planning and thoughtful implementation. This evolving document is a report of that planning and implementation process. It is to be complemented with more easily digestible guides as the product matures.

Termitehtaan tavoite

Termitehdas käsittää Semanttisen Webin tekniikoihin perustuvan ohjelmiston ja toimintaohjeet.

Termitehtaan tarkoitus on koneavusteinen, yhteisöllinen, ammattimainen (tarkka) monikielinen sanastotyö verkossa.

A model implementation of a TermFactory community platform is being built and will be tried out by an evaluation community. However, we try to keep the design as free of platform specific detail as possible. The main research focus is on making Semantic Web technology useful for global multilingual terminology work.

As for workflow, TF tries out these ideas in an experimental terminology network, and by so doing devises a guide of best practices as to how the collaborative distributed work pattern can be made to produce useful results with the least hassle. There are many practical questions to solve, not only technical ones but ones having to do with community building, access rights, authentication, division of labor, organisation and timing of different phases of work, allocation of authority, intellectual property rights, etc. Many of these issues it shares with other collaborative work processes, so TF does not have to invent everything, but rather survey and choose suitable solutions.

One is perhaps tempted to ask, what is new here, what is the gain from TF? There have been a lot of terminology data storage solutions before. There are more efficient uses of relational databases than current persistent ontology repositories. There is a lot of standards and tools for terminological data management, including XML based ones like TBX. There are old and well-tried query and reasoning tools, including SQL and Prolog. What is there to gain from using another burgeoning technology instead of the old tricks? Is it just the novelty value?

The expected benefits of as compared to the current state of the art terminology tools include

Benefits
The benefits from TF
  • Openness and conformance
    • With ontology standards and tools, both conceptual and linguistic content can be globally identified and mechanically validated.
  • Flexible reuse of content
    • Allow third party ontologies to coexist and and (co)develop on separate sites.
    • Different terminology styles can coexist and and (co)develop the same repositories.
  • Ease of implementation and deployment
    • Contents are usable by third party ontology tools
    • Help divide and conquer big ontologies

Ontology tools are not about runtime efficiency in (say) online web applications. They save human work and make possible the management of much larger collections of terminological data than before. Ontology work happens in the background.

The TF architecture

The TF architecture is not embodied in one software package, but an array of standards and software, both web and standalone tools, that allow different actors in different roles to collaborate to produce a shared resource. This shared resource is a distributed multi-domain special language ontology of multilingual terminology. It can be used to organise content and standardise communication in global multilingual organisations, enhance exchange of ideas and innovations across a multinational workforce, or facilitate understanding and support education across language barriers. The core of the system is formed by a hierarchy of Semantic Web ontologies served by a network of interconnected repositories. The human interfaces to the network are composed of online web collaborative work platforms and offline professional terminology and ontology tools.

TF pyramid

Show/hide TF pyramid

The TF pyramid is not necessarily cuneiform in reality, it can be more of a cumulus cloud. It is drawn with a triangle here just because there are many more special concepts than general ones, and many more people working at the lower rungs than at the top. The divisions of the cone depict both the composite structure of the content and the division of work between experts of different things.

TF -arkkitehtuuri

TF Pyramid

TermFactory philosophy

From a philosophical perspective, all communication, including multilingual communication, involves translation, that is, conversion from one format to another. TF is specifically about interlingual translation of terms: internationalization of local content to make it globally shared, and localisation of globally shared content back home.

A familiar notion that helps explain TermFactory philosophy was explained in an old paper of mine (Questions of Identity in Discourse, 1989). Consider how we choose names or addresses we use to identify persons (places) depending on the size of the field of search. In the middle of a dialogue, we point or use pronouns. In talk among family and friends, we use first names and nicknames. Within a wider circle, full names are used. In official contexts, some ID is required. When there are none, a description is offered. In each context we use the name (address) that is most convenient among those that sufficiently identify the object. This also covers naming across cultures and languages: we switch to another code that works best in that context. Though TF primarily deals with multilingual terms, it is really about globalisation and localisation of names of any kind.

From this point of view, the simple answer to what TF TermFactory is about is this: it is a tool for facilitating choice of names and addresses to suit with changes of context. Starting with a local ordinary name for a thing in some language, it allows narrowing down the intended meaning to a special domain concept that has a global identifier, a URI. This global concept is in turn localized to terms and expressions in another language or culture. The facilities offered by TermFactory for interlingual intercultural communication conform to classical terminology theory, but are not confined to terminology only. Considered as an abstract machine, TermFactory creates a complex of string to string mappings, mediated by a semantic network expressed in description logic.

In a more mathematical vein: two category theoretic insights underlie much of TF.

  1. Translations are mappings, morphisms in a category of languages. Globalisation, or interlingual translation of terms through the composition of internationalization followed by localization (i18n o L10n), uses the interlingua of language independent concepts as a category theoretic limit (universal element) of that category.
  2. Universal elements are good for space efficiency and manipulation. Limits in a category use one thing to represent many, taking out the slack from redundant mappings.

Space (thereby maintenance) efficiency is the motivation for having global names in the first place. That is also the ultimate motivation for terminology theory for distinguishing concepts from terms. The TF distinction between terms and expressions is its dual, motivated by multilinguality.

All tasks in TF involve format conversion, and many tasks involve globalisation. URIs are global. Expressions in a (sub)language are local. TF tries to make them global, by giving them uris. At the same time, it goes to no end of trouble to make that global content locally usable. It is efficient and convenient in a given context to name anything with the shortest name that is unique in it. When the context is large, the names get longer. TF aims to make both optimisations possible at once: to globalize, and to localize at will.

Globality is the motivation of the Semantic Web. Its weakness is the downside of globalisation, namely loss of local (time) efficiency. Going through limit is space/resource efficient, like a tree is resource efficient compared to a graph, but it is time consuming for the same reason. Having direct routes is faster if the cost is no object. In the web, fast traffic is peer to peer. In the short run, nobody has the patience to look for long uris, and nobody cares to check if a given local thing already exists globally. Result: a lot of duplication, proliferation of different URIs for the same thing. The homonym problem globalisation sets out to solve is replaced by an equally untractable synonym problem. In a word, ontology hell .

The TermFactory architecture is meant to address this problem. (It cannot solve the problem at large, just think of the scale. It is enough if TF helps tame it in smaller circles.) The solution is something like this: before inventing another global concept associated to a local name, look up in TF what there is already under that local name, and borrow or subclass one of them. If not, share your innovation.

It is important to observe that there is relatively little in the mechanism of TF that is specifically hardcoded about multilingual terms. That is all in the data (the ontologies). Mostly, TF is a bunch of Semantic Web mechanisms to do mappings between strings using description logic based techniques.

Ontologies

An ontology, in the current Semantic Web sense of the word, is a collection of semantic descriptions of concepts in a formal language explicit enough to be processed by a machine.

Ontologies are a direct descendants of the semantic networks of 70's artificial intelligence. AI systems used semantic networks to describe the world to a machine so that the machine could behave intelligently. There was nothing wrong with the idea in itself, it just turned out to be unfeasible to describe enough of the world to make the machines behave intelligently enough, and the AI hype died out.

The idea got a new lease of life by the (already somewhat subdued) hype surrounding the Semantic Web, a new generation Internet where intelligent machines are able to understand and process information produced by and for people. It is supposed to upgrade the current human-interactive Web 2.0 to a human-machine interactive Web 3.0 .

At least for the time being, machines are worse than people at guessing meaning (IBM:s success in TV trivia notwithstanding). To make information accessible to them, people have to be more explicit about what they mean than they have to when communicating to other humans. Specifically, meanings must be annotated, marked up using some metalanguage in web documents for machines to read.

The first step in the 90's toward man-machine readability was XML, the eXtensible Markup Language to make document structure explicit to people and machines. XML gives documents a treelike structure but no particular semantics; the meaning, if any, is up to the user to provide. The ontology languages want to also fix meaning.

The base language for the Semantic Web is the Resource Description Framework language RDF. It is a language for constructing semantic networks as labeled graphs, not just trees like XML. RDF is actually independent of XML, although RDF graphs are by default written in XML. The native "language" of RDF is statements in the form of subject-predicate-object triples, for instance (1)

ont1:Fido rdf:type ont:Dog . ont:Dog rdfs:subClassOf ont:Pet . ont1:Sue ont:owns ont1:Fido .

saying that Fido is a dog, dogs are pets, and Sue owns Fido. RDF graphs are semantic networks built out of such triples.

The web being global, RDF allows identifying the concepts that appear in the triples with global identifiers, called, depending of variety, URLs (universal resource locators), URIs (universal resource identifiers), IRIs (international resource identifiers) or URNs (universal resource names). For instance, the dog's name ont1:Fido above is an abbreviation for the full TermFactory name of a particular globally unique dog, http://www.tfs.cc/ont1/Fido . There are other Fidos in the world, but only one by this name. Global identifiers is one of the main advantages of RDF that make TermFactory possible. Using global identifiers (URLs, URIs, IRIs or URNs), concepts can be identified uniquely and traced back to their owners. In the above example, the meaning of the predicates rdf:type and rdf:subClassOf is fixed (and explained) by the W3 consortium. The rest of the named entities are owned and documented by TermFactory. Another site might have their own Sue and Fido or a different concept of ownership, identified by different URIs.

The more semantics an ontology language has, the more meaning it can express and the fewer statements it needs to express it. But then all the consequences are no longer explicitly stated, and to unpack what is only implied, an inference engine, or reasoner, is needed. For instance, from (1) a reasoner can infer that Fido is a pet, or that Sue owns a pet, though these facts are not explicitly stated in the ontology. The richer the language, the more it can express, but the harder its inference problem, so the trick in defining an ontology language is to find a useful balance between expressive power and tractability.

RDF

RDF has a graph semantics. An RDF model can be visualized as a binary graph made up of labeled arcs.

Show/hide RDF graph

figure 1

Such an RDF graph consists of triples each consisting of two nodes connected by a binary labeled arc. The nodes can be labeled or blank (anonymous). An RDF graph need not be connected. RDF graphs can be named. RDF nodes are either resources or literals. The arcs are called properties. Literals, for instance strings, numbers, or dates, have a label, but no URI and no properties. Resources may have a URI and properties. (They need not have either.) A resource with URI is a named resource, one without is an anonymous resource, alias blank (node). Blank nodes have fixed identity only inside the containing graph. Constant RDF triples (those without anonymous nodes) are completely self-sustained pieces of information that can be bartered between documents and ontologies. This is one of the strengths of RDF. A RDF model is an implementation level concept for a container holding one or more RDF graphs.

RDF has a standard way to assign properties to a triple by way of reification: a quad(ruple) of triples names a triple and associates to the name its predicate, subject, and object. Reification is not much used in practice.

There are a variety of formats to represent a RDF graph as text. Some are described in the section on TF formats further on.

RDFS

RDF Schema (RDFS) is a meta vocabulary for defining RDF vocabularies. It allows typing resources into classes and subclasses, and describing RDF properties.

An RDF property is a URI (named resource) that represents a binary relation viewed object-orientedly, as a property of members of its domain, taking values from a range. RDF properties are many-valued. RDF properties can be partially ordered with respect to generality, using the second order property rdfs:subPropertyOf . RDFS semantics says that properties rdfs:subClassOf and rdfs:subPropertyOf are transitive. A RDF query for a property should also return triples labeled with its subproperties.

OWL

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies, and is endorsed by the World Wide Web Consortium. OWL is considered one of the fundamental technologies underpinning the Semantic Web.

The Web Ontology Language OWL is a subset of classical predicate logic designed to express the kind of statements used by people in defining concepts since Aristotle. The most popular subset of OWL, the description logic fragment known as OWL DL, can express concepts and relationships like these: (translation to Manchester OWL syntax in grey)

Fido is a dog. :Fido rdf:type :Dog
Dogs are pets. :Dog rdfs:subClassOf :Pet
Pets have masters. :Pet rdfs:subClassOf :hasMaster Some :Human
Humans are not pets. :Human owl:disjointWith :Pet
Only humans have pets. :Pet rdfs:subClassOf :hasMaster Some :Human
Fido has two masters. :Fido rdf:subClassOf hasMaster min 2

The main forte of OWL is that it not only expresses those things (for the human reader), but machines can actually reason with them. Description logic engines like Fact++ or Pellet are able to relate such facts to one another automatically and give intelligent, that is, reasoned answers to questions about them, combining facts and drawing consequences that are only implicit in them. That is more than the average relational database does, maybe more than the average person is wont to do.

OWL semantics

Like RDF, OWL has a graph semantics, consisting of nodes connected by binary labeled arcs. But OWL goes past RDF by defining logical constructs with more complex semantics, whose RDF representation consists of several triples. An OWL processor knows to keep such triples together and map between them and alterative OWL syntaxes like OWL functional syntax or Manchester syntax.

OWL properties

OWL properties divide into three disjoint types: object properties, datatype properties, and annotation properties. An object property ranges over resources, a datatype property over literals. An annotation property's value can be anything. In OWL, there is no top property common to all, so an OWL query must always specify which type of property is being sought.

OWL annotations are for information that does not describe the domain itself but talks about the description of the domain. Annotation information is not part of the logical meaning of an ontology, which in practice means that they do not take part in OWL DL reasoning. In OWL 2 DL, one can declare domains, ranges and sub-properties for annotation properties. (See http://bloody-byte.net/rdf/dc_owl2dl/index.html .)

OWL versions and profiles

During the CF project, OWL has passed through two versions, OWL 1.0 and 2.0. The OWL 1.0 family of languages is based on two (largely, but not entirely, compatible) semantics: OWL DL and OWL Lite semantics are based on Description Logics , which have attractive and well-understood computational properties, while OWL Full is intended to provide compatibility with RDF Schema. OWL ontologies are commonly serialized using RDF/XML syntax.

In October 2007, a new W3C working group was started to extend OWL with several new features as proposed in the OWL 1.1 member submission. This new version, called OWL 2, has already found its way into semantic editors such as Protégé and semantic reasoners such as Pellet, FaCT++ or HermiT. For TF, OWL 2 has better metamodeling and annotations.

The new features of OWL 2 include

  1. extra syntactic sugar to make some common statements easier to say, e.g., the disjoint union of classes
  2. new constructs that increase the expressivity for properties, e.g., qualified cardinality restrictions or property chain inclusion, database style keys
  3. extended support for datatypes, e.g., data type restrictions and facets for restricting a datatype to a subset of its values
  4. simple metamodeling capabilities to express metalogical information about the entities of an ontology
  5. extended annotations capabilities to annotate entities, ontologies and also axioms
  6. other major innovations: declarations, new language profiles (sublanguages).

OWL 2.0 comes in three profiles, OWL 2 EL, OWL 2 QL and OWL 2 RL.

  • OWL 2 EL
    • Captures expressive power used by many large-scale ontologies, e.g.; SNOMED CT, the NCI thesaurus;
    • Features include existential restrictions, intersection, subClass, equivalentClass, class disjointness, range and domain, object property inclusion (SubObjectPropertyOf), possibly involving property chains, and data property inclusion (SubDataPropertyOf)transitive properties, keys (HasKey) …;
    • Missing features include value restrictions, cardinality restrictions (min, max and exact), disjunction and negation.
  • OWL 2 QL
    • Captures expressive power of simple ontologies like thesauri, and (most of) expressive power of ER/UML schemas;
    • Features include limited form of existential restrictions, subClass, equivalentClass, disjointness, range and domain, symmetric properties, …;
    • Missing features include existential quantification to a class (ObjectSomeValuesFrom), self restriction (ObjectHasSelf), nominals (ObjectHasValue)(ObjectOneOf),universal quantification to a class (ObjectAllValuesFrom), ObjectMinCardinality, ObjectExactCardinality), disjunction (ObjectUnionOf, DisjointUnion) etc. cf. the Profile document for an exhaustive list missing features.
    • Can be implemented on top of standard relational database.
  • OWL 2 RL
    • Includes support for most OWL 2 features;
    • But with restrictions placed on the syntax, for example it does not include existential on the right hand side of axioms (which often occurs in Life Sciences ontologies, e.g., SNOMED). Standard semantics only apply when they are used in a restricted way;
    • Can be implemented on top of rule extended DBMS e.g., SQL (see Implementation Perspective).
  • OWL 2 EL is the maximal language for which reasoning, including query answering, is known to be worst-case polynomial.
  • OWL 2 QL is the maximal language for which reasoning, including query answering, is known to be worst case logspace (same as DB).
  • OWL 2 RL allows for polynomial reasoning (consistency, classification, and instance checking) using rule-based technologies.

There is an online OWL 2 validator and an OWL syntax converter .

Instances, classes, properties and roles

As a species of description logic, OWL is a decidable subset of first order predicate logic with an object-oriented syntax. As any first order logic, it makes a sortal distinction between instances, or "things out there" and classes of such things. OWL goes past simple classification since it can talk about two-place relations. A binary relation is viewed object-orientedly (or should one say egocentrically) as a property of its subject, having the object as value. Nevertheless, such properties are by default multivalued, and they can have inverses, like parent and child. Most characteristically, OWL deals with roles. Roles are classes one belongs to in virtue of having a property (that is, a relation to something). Most of our apparent inherent properties are at depth roles in some bigger configuration. Something is big only in relation to something small, or a master only as the master of a slave. OWL cannot say very complicated things, but it is good at defining roles. And roles are a good approximation to the classical Aristotelian way of forming terminological definitions: "A species is that subclass of a genus which has the differentiae that...". To give the same point a linguistic turn: OWL has a good match with the natural language syntax of relational nouns and relative clauses traditionally used in terminological definitions.

The TermFactory ontology schema

This section describes the TermFactory ontology schema TFS.owl . The TermFactory ontology schema defines the skeleton of a multilingual terminological ontology in OWL DL, trying to conform to current terminology standards and other language technology standards , rich and precise enough to support semantic inference and language technology applications.

One of the liberating insights from describing terminology in a logic rather than by a fixed schema - whether hierarchical (tree structured) as XML or as tabular (record structured) like a relational database - is that there is nothing sacrosanct about the TFS schema vocabulary. Not only has TermFactory got a fixed entry format, there is no unique fixed signature either. As long as there are ontologies and reasoning, there does not have to be a fixed meta vocabulary for terms. A seed vocabulary like TFS.owl and its companions is there only to help thinking and provide a fixed point for automated reasoning. Beyond that, everyone is free to use their own vocabulary and conceptualization, provided an ontology is provided for it, plus bridge rules to map it to others, and eventually to TFS.owl in particular.

Like classical logic, OWL has an open world assumption built in. There is no way in first order logic to quantify over properties, hence to enumerate all and only the properties that a class allows to its members. As a consequence, an OWL schema can only name individual properties that must or must not be present. In this respect a RDF/OWL schema differs from a database or XML schema. which work with a closed world assumption (whatever is not specifically allowed is prohibited).

To be precise, OWL cannot express property closure "no other properties are allowed for members of this class". It is possible to say of some given class and property that they are incompatible. One can say for instance

ex:Class owl:DisjointWith [ rdf:type owl:Restriction ; owl:onProperty ex:property ; owl:someValuesFrom owl:Thing ]

or the other way round,

ex:property rdfs:range [ owl:complementOf ex:Class ] .

Unlike XML, there is no formal distinction in OWL between an ontology and an ontology schema. One just draws a line somewhere between statements that must hold in all TermFactory ontologies and the rest. The common core is called the TF ontology schema (TFS.owl). Together with its extensions (TFStrict.owl, TFTop.owl, TFSem.owl) and profiles (LegacyStrict.owl, DictionaryStrict.owl) it forms the top ontology in TF.

Like classical logic, OWL is weakly typed. It is possible, but not obligatory, to specify individual type, superclass of class, and domain and range of object properties. The default is the vacuous type owl:Thing . This weakness can be a strength. With strict typing, it can be necessary to create dummy objects just to satisfy typing requirements. In TF, it is rarely necessary. Therefore TF can accommodate a variety of approaches to terminological description. (See section on term properties below.)

The OWL standard in itself is about semantics, or concepts in abstraction from the (human) language in which they may be expressed. Each concept must have a global identifier to hang descriptions on, but as long as it is a valid URI, it matters little what that identifier is. Beyond that, the standard has little to say about language or multilinguality. Basically, the standard suggests using attribute rdfs:label with values in localised language-tagged Unicode strings to specify alternative human readable labels for classes. This is not nearly enough for proper multilingual language (technology) support. The main claim to fame for TF over and above OWL alone is that TF defines an explicit OWL ontology of human language expressions and terms that allows full control of the language associated to concepts.

Show/hide TF terms

TF terms

A TF term, an instance of TF class term:Term , is an association (ordered pair) of language independent concept with a natural language designation. (This accords with terminology standard DIN 2342-1 and de Saussure's definition of a linguistic sign.).

Consider the following example of the two strings home and hallitus . English home has a general language meaning an abode, and a special domain meaning "institution" (as in "we must put granny into a home"). The same string means mildew in Finnish. The Estonian word for mildew is hallitus . The same string in Finnish means government. Each such pairing of a meaning (blue) with a form (yellow) is a term/sign (green) in TF.

TF terms

(version 0.2) The ontology design now consists of the TermFactory schema which defines the top level concepts and their relations (TF-schema), the TermFactory top ontology which contains the semantics of machine generable definitions, and subontologies by way of examples how the ontology is to be extended.

(version 0.3) The TermFactory schema conforms to OWL 1.0 DL. It can be edited with the de facto standard ontology editor Protege version 4 and it passes the Fact++ and Pellet consistency checks. Crucially, OWL Full is no longer used for terms to relate designations (instances) with classes. Instead, classes are metamodeled with puns, or duplicate representative first-order instances. Classes and their representative instances have the same base name but different (predictably related) URI's. The TF class ont:Concept only contains first order representatives (puns) of special language classes ("concepts" in non-TF sense).

(version 0.6) Initially, the TermFactory schema is intentionally kept small as concerns descriptive terminological properties. It will be extended as the terminology work guidance workpackage develops. This extension will take into account existing terminology standards and recommendations. Data categories will follow the ISO recommendations when appropriate.

(version 1.9) Logic intensive integrity constraints have been moved out of TFS.owl. TFS.owl just creates a taxonomy with few constraints on what can belong where. Therefore all TFS.owl does when included to an ontology is to connect concepts according to the TF taxonomy. A separate schema file TFProp.owl holds a TF property taxonomy and defines property inverses. Another TF schema file TFStrict.owl contains the integrity constraints, such as domain and range restrictions and relational properties (inverses, functionality, transitivity, reflexivity, symmetry etc.) When TFStrict.owl is included, heavy duty reasoning can happen that can cause reasoners to hang or break on large ontologies.

(version 2.0) TF top ontology has been moved out of TFS.owl into a separate subsets TFTop.owl and TFSem.owl.

TF top ontology

A thing to keep in mind is that an OWL ontology is not (just, or even primarily) a taxonomy in the classical Aristotelean or terminological sense of a tree of genera and species. It is a semantic network, a collection of nodes and arcs representing objects and binary relations between them. As just a special case of that structure, a set of distinguished binary relations including rdf:type , rdfs:subclassOf , owl:sameAs , owl:differentFrom and owl:disjointWith gives the graph a first order monadic class structure, like Boolean algebra or Venn diagrams. That structure allows many competing taxonomies as trees spanning the graph, none the more or less true than the others. A taxonomy can be thought of as a tree-formed index to the Boolean class structure, serving the purposes of efficient insertion and retrieval of instances.

A two-dimensional tree representation of an OWL ontology is misleading, if it suggests that the subclasses branching out from a class as its subclasses are disjoint. In general they aren't. Logically, the rdf:subClassOf relation is a partial order, not a tree. A class can have many supers in the same "tree" (directed acyclic graph). Thus an OWL ontology tree can have as sisters classes that at best belong to parallel cross-classifying taxononomies. This happens when there are orthogonal bases of classification, like classification by structure (parts) and by function (roles). In standard terminography, there is a special notation for such many-dimensional cross-classifying taxonomies (see e.g. ISO FDIS 704 example 8 ). Unfortunately OWL editors do not support it.

There are two instances of such cross-cuts at the top of the TF ontology. The tripartition sign:Sign-syn:Form-sem:Meaning is cross-cut by the class term:Description . TF descriptions can belong to any of the other three classes. sem:Meaning is subdivided to object-like ( ont:Concept ) and proposition-like ( ont:Content ) meanings on the one hand, and to count sem:Count and noncount sem:NonCount entities on the other hand. These two dichotomies cross-classify to form a fourfield.

What the very top of an ontology looks like is largely philosophical, very little hinges on that choice in practice. TF, as an ontology of terms, takes its inspiration from the category theoretic abstraction of morphism, a structure preserving mapping from one domain to another. This is taken as the germ of the idea of one thing meaning or representing another. A morphism is the reification of a meaning relation as an object. A sign is a special case of such a reification, as a pairing between some form and some meaning, or as the role played by a form at the receiving end (codomain) of such a mapping. A term is a linguistic sign, a special case of a sign. Other signs: images, traffic signs, formulae, are other special cases.

Unlike traditional approaches to terminology, TF has no fixed notion of a terminology entry as a record structure. (But see DESCRIBE queries .) TF builds on the semantic network metaphor, dual to the container metaphor of hierarchical databases (and more recently, XML). The container metaphor comes from physical media like paper or magnetic tape. Containment among convex objects naturally forms tree structures. In a rooted directed tree, it makes sense of talk of nodes as bigger elements containing smaller elements. In an undirected tree or graph, all nodes are equal, any node can be taken as root or focus. Nodes do not contain one another, rather, they are visualised as dots connected by links. RDF/OWL graphs are not rooted. The serialisation of a semantic network in RDF/XML need not respect connectedness. Information concerning a given node may be distributed freely among disconnected descriptions in an RDF/XML document.

As a descendant of classical logic, OWL is weakly typed. It is possible, but not obligatory, to specify individual type, superclass of class, and domain and range of object properties. The default is the vacuous type owl:Thing . TF tries to take advantage of this. Different typing regimes support different semantics (resoning) and different use cases of TF. With strict typing, it can be necessary to create dummy objects just to satisfy typing requirements. With weak typing, it is not necessary.

An even weaker ontological position is to consider concepts, terms, and expressions all intensional objects of a kind. An expression is a term without its semantic properties, and a concept is one without expression properties. Now all instances are signs. There is no sortal distinction made between things out there and their names. Individuals get identified with individual concepts. Jesus Christ belongs to classes Human and Noun. In the more extensional use case of traditional (Western) terminology theory, these classes would be asserted to be disjoint. By Aristotle's (and Quine's) definition of ambiguity, this use case makes individuals ambiguous. Christ is both human and not human (unless human is made equally vague). Ambiguity is not harmful as long as no reasoning is done on it that hinges on it. One can translate correctly on the basis of form when the translation retains the same ambiguity. This theme is pursued further in the section on TermFactory schema profiles .

Another application of the relativity of the TF top triad is that TF descriptions can be forms, meanings or signs. This allows the whole gamut of attaching uninterpreted labels, language specific messages, and language independent content as descriptions of other such entities.

TF algebra

An Aristotelian insight is that the distinction between meaning and form is not absolute: not a classification but a relation. What counts as matter and what form is relative in the way domain and codomain of a morphism are relative to the morphism. In normal TF usage, an expression is a general language vocabulary item whose meaning is narrowed or transferred when it is used as a special language term. The common language expression, and common language in general, is form relative to special-language terminology. The special language expression borrows linguistic properties from the conventional meaning of the general language expression. But that general language expression can be further analysed as a linguistic sign with form and meaning. This notion accords well with the spirit of the category theoretic abstraction: morphisms are closed under composition. Terms and expressions are also (instances of) concepts. TF does not say terms and expressions are disjoint classes. An instance which is at once a term and an expression can have properties of both classes.

The underlying motive of the form-meaning morphism is the core category theoretic notion of limit (universal element). A limit in a category is an object through which all the remaining objects can be factored. It takes out the slack from the category, represents what is common to it. In translation, a language-independent (interlingual) meaning, shared by a set of synonyms, is a limit (universal element) through which an equivalence class of synonyms is factored. In graph terms, the reified meaning constitutes the hub of a spanning subtree of form-meaning relations that removes the redundancy of a square matrix of bilateral synonymy relations. Recursively applied, the reification of synonymy into meaning allows logarithmic savings in the size of the representation (from m times n to m plus n). Dually, the meaning-form morphism allows analogous savings in the representation of homonymy. (In the original Aristotelian sense: two things are homonymous if they are called by the same name). Properties of the shared name need only be mentioned once. Monolingual sector terminologies have little need for this split because of the ideal of monosemy per special language: an expression should have just one sense per subject field. The situation is different across sectors and (sub)languages: one and the same expression (say operation ) is used in many fields.

An abstract way of looking at concepts, terms, and expressions emerges from Aristotle's insight. A term is the reification of a term:designates relationship, obtained by inserting a node and splitting the relation into the composition of two (term:designates equals term:designationOf o term:hasReferent).

More interestingly, the two extremes, concepts and expressions, appear dual. A concept is a "set" of expressions, and an expression is a "set" of concepts. This is not set theoretically sane as such, but makes category theoretic sense. Assuming both concepts and expressions form boolean algebras (closed under join, meet, and complement) and there is a bijection between atoms of one and coatoms with other ('basic' concepts have 'basic' designations, as they do in any ontology), it is possible to find for any concept an uniquely corresponding expression, and for an expression a uniquely corresponding concept. It is in the sense of this bijection that concepts "are" expressions.

Categorially taken, then, concepts "are" (match one-one with) expressions, just dually related: a concept "is" a synset, an expression "is" a homset (the set of all the concepts that it names). Except we should not talk about sets. Since we are metamodeling classes and set theoretic relationships already, we can use the metamodeling instances and properties to build a purely algebraic model. The dual map between concepts and expressions is term:designates. E designates C, where since both are Boolean algebras we can always choose E and C so that designates is one-one.

Vocabulary is not sacred in TF. Given reasoning, we can have it both ways, as long as we make the logic clear. If one wants to avoid reifying terms and signs, that too is possible, without leaving TF. Using OWL2 property chains, TFS can define a shortcut link between a concept and and expression term:designates as the composition of the relations term:designationOf and term:hasReferent , avoiding mention of terms entirely. Similarly, sign:signifies short-circuits between forms and referents of signs.

Show/hide limit diagram

Form and meaning as limits

With the composite property term:designates , a simple multilingual word list can be converted to equally simple TF triples:

<http://tfs.cc/exp0/Finnish> term:designatedBy [ exp:langCode "fi"; exp:baseForm "suomi" ] .

We may call this format TF Compact.

An advantage of using vocabulary consisting of term:designates and rdfs:label is that it avoids ontological questions. A rdfs:label with a language-coded string makes the least possible ontological commitment. We avoid awkward questions of expression identity and ontology mapping. Assume triples like the one above are added to a model. If one is just interested what labels a concept might have, a query for rdfs:label will include these. If the reasoner understood composition, it would also return TF baseforms that could be composed into labels. What if the query is for base forms of expressions? For the reasoner to answer a baseform query with a rdfs:label , it would need to do decomposition, that is apply property chain axioms in reverse to unpack rdfs:Label and infer the existence of a (blank) expression having as baseform that label. Existential instantiation is not what ontology reasoners typically do, so chances for real time queries are slim. But this reasoning could be implemented through some offline processing. The main thing is that the semantics is right. (In the long run, it is counterproductive to avoid ontological questions about expressions. Without ontological commitments, we cannot keep track on our vocabularies: manage sources, authorship, versioning etc.)

The compactification caused by composition is consistent with all TF profiles. TF profiles concern another dimension, namely which TF classes are disjoint and which TF properties are allowed on which TF class members.

TF Compact is an example of OWL turning what first looks like a syntatic format conversion into a matter of description logic reasoning. Instead of running a syntactic conversion script, we reason with a bridge ontology. One might say it is all just a matter of semantics in the end.

With property composition, we can make exact sense of the Aristotelian intuition that form-matter relationship is a relative one, forms a scale. To express that a special language concept is designated by an expression that in turn has a general language meaning, we go over duals like this:

C term:designatedBy E sign:designates M

where C meta:subClassOf M . Adding decomposition into terms and signs, this becomes

C term:referentOf T term:hasDesignation E sign:formOf S sign:hasMeaning M

where T is a special language term and S is a lexicographic sense of the same expression E.

A mapping of TF Compact into minimalist pure RDF could be obtained if we could define the composition of object property designatedBy with datatype property exp:baseForm and subsume it as subproperty of rdfs:label , hiding the language code as a datatype string attribute in the string value:

<http://tfs.cc/exp0/Finnish> rdf:label "suomi"@fi .

unfortunately.Current OWL 2 does not cover composition of object and datatype properties (We could express it with rules.)

TF ontology subsets

The core TF schema TFS.owl only defines the skeleton of the TermFactory special language term ontology. In its intended model, a term is a pair of an designation and referent. The designation is a natural language expression. The referent can be a general concept (class pun) or a named instance (a person, building, country etc.). TFS.owl does not entail all the properties of this intended model, not even all those expressible in OWL. More properties of the intended model can be imported from schema extensions as they are needed. In particular, they may be useful for integrity testing, to make sure that a term ontology reflects the intended model. The point is that not all of the properties are always needed, and the more of them are asserted, the more work there is for reasoning. In particular, axioms which entail separation of classes and distinctness, existence or uniqueness of instances are expensive to reason with, because increase the size of the model by they multiplying the number of entailments and blank nodes. Such complicating OWL axiom types include owl:disjointWith, owl:differentFrom, owl:someValuesFrom, owl:FunctionalProperty, and rdfs:domain/rdfs:range restrictions.

The WordNet to TF bridge TFwn.owl places the WordNet class Synset as a subclass of Meaning. Wordnet senses become instances of Sign (with property term:approximate "true" as default). WordNet synsets differ from TF Full style concepts by having an inherent part of speech, which in TF Full is a form property. Also, WordNet senses have label, which in TF Full is a form property. This makes WordNet an instance of Legacy profile with some traces of Lite: Synsets are instances of Meaning but they have part of speech; Senses are instances of Sign but they have base forms. There is an additional WordNet class Word that contains words (Forms). See section on morphology .

The following table and graph describe the subsets of the TF top ontology. The labeled clusters and connectors depict conceptual and functional groupings, not OWL import relationships. There are no built in import relationships between the subsets, to allow free mixing and matching of the subsets in different combinations.

Show/hide TF schema subset graph

TF subsets
TF subsets
Subset gloss
TFS.owl contains just enough to support day-to-day business with validated TF terms.
TFTop.owl extends TFS.owl for general language vocabulary, in particular,
TFwn.owl bridges Princeton Wordnet to TFTop.owl
TFSem.owl extends TFS.owl for NL semantics, in particular, semantic role frames.
TFProp.owl contains a taxonomy of TF properties for querying and conversion.
TFL10n.owl and its language specific subsets TFL10n-??.owl localize TF vocabulary.
TFLang.owl contain ISO language codes in TF format.
TFCtry.owl contains ISO country codes in TF format.
TFDom.owl contains the TF subject field classification for querying and browsing.
TFStrict.owl contains integrity constraints for validating a TF term ontology.
DictionaryStrict.owl separates terms from expressions.
LegacyStrict.owl separates terms from concepts.

The rest of this section is organised as a set of frequently asked questions about modeling classical terminology theory in TF.

Q: Has every term got a designation and a referent?
A: Designation is a named class in TFS. It could be defined as term:designationOf Some Term . Referent is not a named class in TFS. It could be defined as term:referentOf Some Term . Expression is a superclass of Designation that also contains Text. TFS.owl defines object properties term:hasDesignation and term:hasReferent . They are classified as basic term properties in TFProp.owl:
<owl:ObjectProperty rdf:about="&term;hasDesignation"> <rdfs:subPropertyOf rdf:resource="&meta;termObjectProperty"/> <rdfs:subPropertyOf rdf:resource="&meta;basicObjectProperty"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:about="&term;hasReferent"> <rdfs:subPropertyOf rdf:resource="&meta;termObjectProperty"/> <rdfs:subPropertyOf rdf:resource="&meta;basicObjectProperty"/> </owl:ObjectProperty>
Q: Is the designation of a term always an expression?
A: Yes. The range restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasDesignation"> <rdfs:range rdf:resource="&exp;Expression"/> </owl:ObjectProperty>
Q: Is the referent of a term always a concept?
A: No. It can be any TF object. The range restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasReferent"> <rdfs:range rdf:resource="&meta;Object"/> </owl:ObjectProperty>
Q: Do only terms have designations?
A: Yes. The domain restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasDesignation"> <rdfs:domain rdf:resource="&term;Term"/> </owl:ObjectProperty>
Q: Do only terms have referents?
A: Yes. The domain restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasReferent"> <rdfs:domain rdf:resource="&term;Term"/> </owl:ObjectProperty>
Q: Does a term have a unique designation?
A: Yes. Functional property assertion in TFStrict.owl:
<owl:ObjectProperty rdf:about="&term;hasDesignation"> <rdf:type rdf:resource="&owl;FunctionalProperty"/> </owl:ObjectProperty>
Q: Does a term have a unique referent?
A: Yes. Functional property assertion in TFStrict.owl:
<owl:ObjectProperty rdf:about="&term;hasReferent"> <rdf:type rdf:resource="&owl;FunctionalProperty"/> </owl:ObjectProperty>
Q: Can two different terms have the same designation and referent?
A: No. A term is uniquely identified by its designation and referent. Uniqueness assertion in TFStrict.owl:
<owl:Class rdf:about="&term;Term"> <owl:hasKey rdf:parseType="Collection"> <rdf:Description rdf:about="&term;hasDesignation" /> <rdf:Description rdf:about="&term;hasReferent" /> </owl:hasKey>
Q: Are the term and its designation different things?
A: Asserted in DictionaryStrict.owl:
<rdf:Description rdf:about="&term;hasDesignation"> <rdf:type rdf:resource="&owl;IrreflexiveProperty"/> </rdf:Description>
Q: Are the term and its referent different things?
A: Asserted in LegacyStrict.owl:
<rdf:Description rdf:about="&term;hasReferent"> <rdf:type rdf:resource="&owl;IrreflexiveProperty"/> </rdf:Description>
Q: Are terms and expressions disjoint classes?
A: Disjointness asserted in DictionaryStrict.owl:
<owl:Class rdf:about="&term;Term"> <owl:disjointWith rdf:resource="&exp;Expression"/> </owl:Class>
Q: Are terms and concepts disjoint?
A: Disjointness asserted in LegacyStrict.owl:
<owl:Class rdf:about="&term;Term"> <owl:disjointWith rdf:resource="&ont;Concept"/> </owl:Class>
Q: Are designations and referents disjoint?
A: Not asserted either way. Counterexample: a quoted expression is an expression which designates an expression.
Q: Are all expressions designations of some term?
A: Not asserted either way.
Q: Are all concepts referents of some term or other?
A: Not asserted either way.
Q: Can a concept be referent of more than one term?
A: Not asserted either way.

TermFactory properties

The main classification tools in an OWL ontology are class and property hierarchies. A good TF sub-ontology bridges its top concepts (at least indirectly) to the TF schema and checks predefined classes and properties for fit before subclassing or inventing its own. It is good ontology writing practice to use classes ( rdf:type exp:English ) instead of distinguishing properties (like exp:langCode "en" ) for classification when workable, because description logic reasoners and editors support classes best. (The TF schema actually defines the class exp:English as the class of those things whose langCode is "en".) It is also good practice to build subproperty trees instead of a long flat list of properties. Subproperty relations allow alternative views on related properties and support querying properties by class instead of by name. This is particularly important with repository specific new types of properties. A query engine is able to return meaningful subsets on the basis of subproperty classifications without having to know the individual properties. This is valuable when the ontology collection is composed of contributions from independent sources.

TF divides properties first of all into properties of terms, expressions, and concepts. A group apart are semantic properties in the sem namespace, used in the TF natural language interface. To make the classification of properties as concept, term, or expression properties explicit, we build explicit hierarchies of TF object and data properties in TFS.owl. Here is a representative sample:

meta:objectProperty meta:conceptProperty meta:termProperty meta:expressionProperty

This property hierarchy is kept in a separate ontology document TFProp.owl . In it, each property also has a paronymous instance (pun) in namespace meta0. The puns are used in localising property names. (Paronyms for properties are needed in OWL 2, because even OWL 2 does not allow punning property names.) The puns are classified into a property instance classification that runs parallel to the property hierarchy:

meta:ObjectProperty meta:ConceptProperty meta:TermProperty meta:ExpressionProperty

The property classification simplifies the statement of property ranges, since the range of each type of property can be given just once for the type. Other properties inherit it from the type(s) they belong to.

Term properties

In the TermFactory terminology model, a term is a special case of a two-faced de Saussurean sign that links a meaning or referent (signifié) with a form or expression (signifiant). In practical terminology work, the distinction between term and expression is not always made explicitly. In TF, it is an option motivated by multitopicality and multilinguality. Among other things, it allows separation of those properties that expressions have by general-language grammar from those properties that are associated to them only in a specific domain-dependent or special-language meaning.

The TF expression-term-concept triad instantiates de Saussure's notion of dyadic sign ( Wikipedia s.v. sign ). It not identical to the triangle of reference or semiotic triangle of Peirce, Frege, and Ogden/Richards. In the semiotic triangle, the corners are named symbol (sign), thought (reference), and object (referent). Here, expression and sign are not distinguished. Instead, there is a split on the meaning side between intension and extension. This distinction goes back to Platon and Aristotle. The corners of the semiotic triangle are word (symbol/sign), mental affection (thought/intension), and thing out there (referent/extension). Aristotle's words-as-symbols or signs match TF expressions-as-terms (these two are not separated). The semiotic triangle thus corresponds to TF Legacy profile. Mental affections (intensions) are best matched with TF classes and concepts. Things out there (extensions) have a reflex in TF instances, which denote such things.

In terminology literature a similar figure is known as the terminology triangle. Its corners are concept, term and object.

With respect to the TF triad, terminological properties tend to fall into three groups: those which hold of a concept independent of its expression (concept properties); those which hold of the expression (expression properties), and those which hold of the combination, or of one side relative to the other (term properties). However, TF does not take too strict an approach to the tripartition. Sometimes an expression property is better associated to a term (if the expression is too loose). Sometimes a meaning property is more correctly associated to a term (if the concept is too loose).

TF tries to avoid the necessity to postulate dummy elements just to satisfy type requirements. A simple glossary may not warrant creating even dummy concepts to the terms it describes. Descriptions can in TF be associated to concepts, terms or expressions. There is no necessity for a TF ontology to contain a single term at all to be well formed. There is no notion of "minimum entry" in TF. Any set of triples conformant to the TF schema can constitute a TF document.

By default, the denotation relation term:hasReferent between a term and a concept is taken to be exact. When it is not, exactness can always be restored by positing intervening concepts which match their terms exactly and whose extensions bear various subclass relations to one another (definable by owl:subClassOf ). This solution is preferred because it renders the semantic relations transparent to OWL reasoning. As a shorthand, the TF schema also allows expressing degree of equivalence with property term:approximate with literal values false|true|super|sub . These shorthands are not OWL, so they have no reasoner support unless given axioms or preprocessed to OWL. (Compare also TF Lite .

Two terms are approximately equal if they have nonempty meet and join, i.e. they are non-disjoint co-hyponyms. This much is implied by term:approximate value true . The rest of the values are interpreted analogously: sub says that the exact denotation of the term is included in the given referent, and super says the exact denotation is wider than the given referent. The default value is false meaning that the match is exact (not approximate). A related TBX feature directionality connects semantically closest neighbors between approximate terms around some concept. This facility is not supported in TF as such.

The degree of equivalence can be approximate when an expression is associated to a concept of another locale (culture) by way of translation. However, the property of being a translation is not another degree of equivalence. In TF, a term can be recognized to be a translation from a disparity between language code and country code . If the language and country codes associated to the term (either on the term itself or on one of its two sides) do not agree, the term is not vernacular, i.e. it is a translation.

Term property term:hasTransferComment and the description type term:TransferComment pertain to the TF Lite use case. Many of its uses can be recoded in TF Full in ways that are more transparent to automatic reasoning.

The terminology standard property termType takes as values such types of terms as phrase, full form, short form, variant, abbreviation, acronym. These become subclasses of class Term. Phrase subclasses Expression because phrasehood is largely parsable from the expression independent of its meaning. As in linguistics, TF countenances one word phrases. The terminological notion of phrase denotes phrases which are not words. Perhaps we want another notion for a terminological phrase (one that has two or more words as compositional parts).

As for the remaining values, the assumption is that an expression in a given meaning can be recognized on its form to be of one or another of these types, as defined in the standard. (For instance, a variant means an alternative spelling or allomorph of another form). These can be considered term properties on the presumption that it is either computable or uninteresting just which other form(s) each form is related as , say, abbreviation for, or variant of. If so, they become classes like exp:Abbreviation or properties like term:shortFormFor depending on the case. (Class term:ShortForm can be defined in OWL from property term:ShortFormOf .)

A few TBX term types, like antonym of, false friend with, homograph with, should perhaps go unanalysed in TF schema, if one wants to reason with them. Antonymy is a second order property related to the OWL notion of inverseOf, false friends and homograph are relations between terms with similar expressions but different referents. But an imported vocabulary can of course keep its own notion of antonym.

Property term:register holds a picklist of usage register values. Register tells about the situation where (to what sort of addressee, in what audience) the expression can be used. Possible values might include high, low, formal, scientific, poetic, colloquial, technical, slang, familiar, honorific, vulgar. For values, see http://www.isocat.org#interface/index.html .

Property term:usage takes picklist values describing how generally (by what part of the population) a term is used. Values might include rare, common, dialectal, obsolete. There is a slippery slope here toward register (some labels like learned might describe usage or register).

Property term:attitude lists picklist values for what an expression tells about the attitude of the author to the referent. Values might include meliorative, pejorative, augmentative, diminutive, euphemistic, hypocoristic. These three data properties are not functional.

Descriptions

Descriptions, in terminology theory, are usually textual descriptions of concepts of terms. In TF, a description is any object that describes another object. TF schema leaves it open what type of TF object a description is. It makes room for descriptions which are not textual. In terms of the TF ontology, a description can be anything: a form, a meaning, or a sign. We want to keep TF terminologies flexible about multilinguality and semantic explicitness. For this reason, TF descriptions can be signs or meanings, and they can be associated to any TF objects. This tolerance allows associating langauge-independent descriptions to meanings and language-dependent descriptions to signs. The former tactics is the natural one when one plans to machine generate language-dependent descriptions from a common interlingua. The latter tactics allows semantically unanalysed localised descriptions which have no equivalents in other languages.

The latter approach offers itself if the subject field has not been subject, or is not amenable, to concept analysis. This happens in new fields and areas where meanings are sacrosanct or a bone of contention (like religion or politics). This is where the whole distinction between concept and term is somewhat artificial, with little basis or even interest in distinguishing the concept from the term. (See TF schema profiles .)

Since TF is weakly typed, we can leave types of items open and reify them later. A language specific explanation can start out as just a exp:Text node with a string associated to a specific term in that language. TF does not try to enforce any one discipline here. The character of the link can be inferred from the types of the entities at the two ends.. If the explanation gets translated to more languages, it starts to make sense to create a ont:Content node for the shared content to keep the translations together. If the explanation link is kept as it was, other translations for the explanation are accessible over its sign:hasContent link. But if a multilingualised explanation is associated to a language independent concept, it makes more sense to create an explanation link between the contents.

TF has not got the TBX notion of a language set as a container keeping together items in a given language related to a concept as a primitive. But nothing prevents an application that wants to create such collector nodes, for instance as an intermediary step to TBX conversion, from constructing them using SPARQL for instance.

Two well known subclasses are definitions and explanations. Explanations are descriptions which explain the use of a concept but do not have to separate it from other concepts. In TF, term:Explanation is a subclass of term:Description .

Definitions

There are two traditions for what definitions are or should be like. The Aristotelian-terminological tradition places a concept in a taxonomical system of concepts in terms of a superordinate concept (genus) and distinguishing features (differentiae). A definition in this tradition is the definiens, a phrase of the same syntactic category as the definiendum that can be substituted for the definiendum salva veritate . In the logical tradition, a definition is a theory that fixes the meaning of a term in some model (or class of models). Such a definition can be explicit (an equivalence whose sides are the definiens and definiendum) or implicit (a sentence or set of sentences in which the definiendum occurs).

TermFactory countenances both types of definitions. An explicit concept definition of the terminological type is a textual sign of type term:Definition that bears relation term:definitionOf (a subproperty of sign:hasMeaning co-hyponymous with term:referentOf ) to the concept it defines.

A logical definition, in an explicit sententical equivalence form or an implicit or contextual definition, is a description of type sign:Definition that bears the relation sign:definitionOf to) another similar object. It does not entail that the sign and its definition are intersubstitutable as such.

Finally, there is a simple data property meta:definition that just associates a string to something else as its definition. This type is provided for TF Lite, and expected to go away in conversion from Lite to Full.

OWL has its own definitional primitives owl:sameAs and owl:equivalentClass . For a TF definition to be correct, the definiens and its definiendum must be logically equivalent in OWL terms, i.e. the relation owl:equivalentClass (or owl:sameAs in the case of individuals) holds between the referents of the definiens and the definiendum. The definiens and the definiendum can have (in terms of sign:hasMeaning ) the same TF meaning but they need not. The meanings can also be different resources (have different URIs), as long as they are OWL equivalent. In general, a concept can have many extensionally equivalent, but intensionally different definitions, all equally correct.

Circular definitions are not incorrect, but they may be uninformative. If we want to use reasoning to prevent circular definitions in TFS, term:hasDefinition must not be an equivalence relation, but a strict partial order (irreflexive, transitive, asymmetric). It is an intensional refinement (a spanning tree subset) of the equivalence relations owl:sameAs/owl:equivalentClass .

TF (v 1.5) embeds class term:Description in a more generic class meta:Description . In terminology theory, it is a requirement that descriptions (definitions, examples etc.) have source indications. term:Description requires source. General language dictionaries, on the other hand, don't usually give sources. The more generic class meta:Description does not require sources.

Concept properties

At the top level, the TF triad divides signs into form and meaning. A related distinction that cross-cuts meaning is the duality between referential, object-like entities (which exist or not) and informational, proposition-like entitities (which are true or false). On the object side, a TF ontology contains terms that relate designations and concepts. On the proposition side, a TF ontology contains messages with text and content . In each case, there is some comḿon kernel which gets reified into a language-independent concept on the other hand, or language independent content on the other hand. The "moving parts", the language specific frills, get associated to this kernel as so many expressions for it.

The TF class ont:Concept is the class of first-order representatives, or puns, of OWL classes. ont:Content stands for language independent meanings with a propositional semantics (usually, the interlingua used is English, but it can also be some other language, formal or natural.) exp:Text stands for its concrete expression in some language. term:Description is a role for (mostly propositional) representations of concepts, including definitions, explanations and the like.

It is a small innovation to extend the TF meaning-sign-form triad to propositional content. The extension allows associating multilingual definitions etc. to a concept in one go, leaving it to the system to sort out language-specific versions. (TBX tries to do something similar with its language set grouping, but the connection between matching language versions of the same definition are not explicit in it.)

A third central logical distinction is the distinction between individuals (tokens) and classes (types). This distinction is represented in owl DL with with the first order logic class owl:Individual , second order metaclass owl:Class and property rdfs:type that expresses class membership between an individual and a class. In terminology theory, the distinction between individual and class corresponds to the distinction between (general) concepts, designated by (general) terms, and individual (concept)s, whose designations are sometimes known as nomenclature or appellations (ISO 1087-1:2000:2,6).

Product or brand names (such as CocaCola), like many other type-token ambigous terms, are undecided visavis the individual/class distinction. As names of individual companies, they act as proper names. As names of product (line)s, they behave as classes with instances. The individual/class distinction is not absolute: what should count as an individual and what a class is a practical question that depends on what kind of reasoning support is required. It the ontology is to support stock accounting, it may be better to treat CocaCola as a class and cans as individuals. If the ontology is for tracking company news, CocaCola can be treated as an individual.

Stock keeping units and part numbers be accommodated as well. TF is not a tool for real time stock accounting, but it can be useful to attach a TF ontology to such a system for semantic search or localisation purposes.

Traditional terminology theory divides concept relations into three types:

  1. generic (intensional, concept) relations. These are expressed in TF with relations rdf:type and rdfs:subClassOf plus OWL primitives like owl:DisjointWith .
  2. part-whole relations. These are in TF represented with the relation ont:hasPart and its inverse ont:partOf .
  3. the rest, lumped together as functional relations. These are represented in TF by whatever seems fit in each case.

A further important concept property in TF is the property meta:hasDomain . It relates meanings to a thesaurus, or taxonomy of concepts naming fields of human inquiry. It codes a looser thematic relation between concepts, like the broader/narrower term relation in bibliographical thesaurus construction. Such thematic relations can be construed as inclusion relation between document collections: a concept is narrower than another if documents about the former are included in documents about the latter. In TF, this relation gets explicated through the domain (subject field) classification. A TF concept is thematically narrower than another concept if the range of the meta:hasDomain property for the former is included in the range of that property for the latter. Thus the thematic narrower than (or more properly, narrower or equal) relation includes, but is not exhausted by, the rdfs:subClassOf relations. For example, partOf relations may entail thematic inclusion as well (car parts belong to the domain of cars).

Frequency is represented as a picklist property meta:frequency taking real number values and a picklist valued term property term:usage with traditional values like common|rare .

Geographical usage is indicated by the picklist property ctryCode with values in the ISO standard of country codes.

Connotations

The TermFactory ontology schema contains property sem:hasConnotation for connotations, that is, non-denotational semantic associations, other entities that a given entity "brings to mind". Expressions, terms, and concepts all have connotations. Unlike denotations, connotations tend to spread by association. They may or may not be definable in terms of more specific relationships (i.e. there may or may not be a cause or good reason for a connotation). Connotations can be associated to any ont:Object .

Often, connotations are good or bad, i.e. they associate a value judgment to something. A value judgment is a subjective preference by a value subject. A preference is a binary comparative relation. To express that death has negative connotation (in common opinion), one may write ont:Death sem:hasConnotation sem:Bad . In this case, we can probably say more: death does not just connote something bad, it actually is a bad thing (in common opinion), i.e. ont:Death rdfs:subClassOf sem:Bad .

Expression properties

TF distinguishes expressions from terms. Expressions are language-specific strings with grammatical identity and properties and (hence) some implicit general-language base meaning. Nothing prevents describing general-language meanings explicitly in TF, in which case those general-language form-meaning pairings appear as TF signs. For instance, general cross domain vocabulary relevant to information extraction might be stored in TF in just this way. Also it can be useful to be able to relate new or unstable special language meanings to their general language homonyms (e.g. long-term unemployed ). Rich general-language resources on the expression side makes it possible to machine generate from one "terminological lemma/lexeme" a maximum variety of compositionally predictable derived forms without separate listing.

The term-expression distinction offers two ways of representing term elements, or such parts of terms which deserve to be identified in TF. Parts which have grammatical relevance but no terminological meaning appear as expressions. Those parts which have independent terminological meaning become terms of some less prominent grammatical category. There is nothing in TF to prevent adjectives, adverbs, compound parts, roots or affixes from being terms. (But compare below . Lexicalized phrases like handle with care are accommodated in the sister subclass sign:Message of sign:Sign . Other fixed special language textual units like standard formulas or texts might be included here as well. Note that TF does not make a distinction in kind between "main" and "accessory" units like terms vs. definitions. They are all equally objects of description in TF.

An obvious property for expressions (but not exclusively for them) is exp:langCode with values in the ISO standard of language codes.

Another central property is exp:catCode with a picklist of values abbreviating parts of speech. These part of speech codes are associated with corresponding expression classes which carry the same code. No confusion here: though the ontology class exp:Adjective carries catCode A, the English expression en-adjective-N has catCode N.

A third crucial property of expressions is exp:baseForm which spells out the lemma or base form of an expression. A base form can be any valid XML, i.e. properly nested mixture of Unicode strings and XML tags.

Other properties

Some properties are too generic to fit into any of the above categories. The meta namespace houses properties which contain TF metadata.

Terminology theory insists on documentation of sources. In TF, there is a class meta:Source for source indications. It has one built in datatype property which is the URI of the indicated source. More properties may get added. Source indications are associated to terminology objects with object property meta:hasSource . Any TF objects can get sources. Some convention may be needed about default propagation of source indications.

Database subsetting should not call for special machinery in TF over and above namespaces and user definable OWL classes/properties. Boolean property hidden is an example of ad hoc repository management.

rdfs:comment is the annotation property to use for ad hoc metaleval comments. No particular constraints apply to what such annotations may contain. owl:versionInfo is used for version management. There should be only one version info element in each TF ontology version. (Old version info items can be saved as some other annotation such as a comment.)

There may be a need for symmetric cross-references between objects of various types. The TF top pointer property is meta:see . The name is an exception to the rule that object valued properties should be defined as hasSomething. This is because see properties are symmetric (self inverse). The subclasses can have names like term:seeFalseFriend or seeHomograph .

The current TF schema is only a beginning. TF namespace owners are free to invent new properties. It is good to subclass new properties from existing ones, to reduce clutter and help query data by property type. If successful, they may migrate higher up in the pyramid, eventually end up in TF schema.

TF linguistics

One of the goals of the explicit machine-processable expression ontology is to make it possible to parse and generate natural language text from and to TF ontologies. Using a natural language generator and the TF schema vocabulary, it should be possible to generate definitions and explanations of concepts from an ontology automatically in the different languages covered by the expression ontology. Using a natural language parser and the TF schema vocabulary, it should be possible to parse natural language queries to the ontology and to carry out natural language commands from users. Putting this together, TF could support a multilingual human-machine dialogue system. In particular, TF should be able to explain and modify itself in natural language. The cparse NL multilingual parser/generator is going to be tested on TF vocabularies for multilingual parsing and generation of term definitions.

The EU MOLTO project shall apply Aarne Ranta's Grammatical Framework interlingual translation system to special domain translation. The special domains will be described as OWL ontologies. TF is to be used as a vocabulary acquisition tool in the project.

Here is a design sketch for the connections among MOLTO and TermFactory components.

Morphology in TF

In special language terminology, morphology plays a minor role. Terms as lexical innovations tend to have simple morphology. In terminology collections, expressions associated to concepts are given in some base form largely in abstraction of derivational or inflectional morphology. This is not always efficient or illuminating or even sufficient. For instance, an expression like exp:fi-epäkunnossa-P 'broken' has no base form. Its opposite exp:fi-kunnossa-P is an inflected case of base form exp:fi-kunto-N . Should exp:fi-epäkunnossa-P be related to exp:fi-kunto-N , and if so how?

Here is one proposal. exp:fi-kunnossa-P is simultaneously classified as an exp:Expression and as an inflected syn:Form whose syn:hasBaseForm is exp:fi-kunto-N

Inflection

. In general, when a form is a base form (lexeme), it is an exp:Expression as well as an inflected syn:Form, and related by syn:baseFormOf to its inflexions. The inflected form, which is perhaps not an exp:Expression at all, only a syn:Form, points to the base form with syn:hasBaseForm. For instance, form exp:fi-epäkunnossa-P can be connected to exp:fi-kunnossa-P by some appropriate expression property, say syn:hasDerivedForm.

Except for semantically or terminologically conditioned special cases, morphology is better not enumerated in TF, but by a morphology processor. It hardly makes sense to include paradigm listings into a static ontology document when such listings can be generated by rule. The natural approach is to provide a base form and enough tags and form to allow generating the whole paradigm from them using some morphological processor.

Number and count are best considered properties of concepts (class puns) not instances. An individual instance like ont1:Java can belong at the same time to ont:Program whose pun ont0:Program is count and to ont:Software whose pun ont0:Software is noncount.

Derivation

Say parse is a term in the category Verb. It has a productive agent noun parser. Are they the same term or two? We need to split this into two questions. Pro primo: Are there two expressions (lexemes) here with different resource identifiers? Pro secundo: do we need to enumerate and store them in the repository? The answer to the first question is probably positive. parser, though regularly derived from weld, is another lexeme, with its own set of forms. It belongs to a different part of speech than lead. The associated concepts are also different. One classifies processes, the other people. Assuming people are disjoint from processes, they cannot be the same class. One resource cannot have conflicting properties, so we have two. In another language, the relation of the corresponding terms need not be as predictable, and we want to be able to tell which translates which. On the other hand, if the derivation is fully predictable, we need not store both in the repository. One can be automatically derived from the other using an ontology/rule schema for derivation or an external morphological processor. Only axioms need to be enumerated, theorems follow.

How to implement this in practice needs thought. Assume there are resources en-parse-V and ont:Parse in the ontology, but no resource en-parser-N or ont:Parser. How does one query for 'the agentive noun for the verb parse'? Will a query for string pattern "parse" include the agentive noun? Will a query for "parser" find it? Can one find an entry for en-parser-n_-_ont:Parser if it is not already stored?

For the first question, we might write a special query handler for a triple like en-parse-n exp:hasAgentiveNoun ?x. The query handler for this predicate would dispatche to a special purpose reasoner, perhaps implemented by a morphological processor. For the string match query the query engine would need to expand derivatives to find matches for unlisted strings. As for the last question, parsing unknown resource names on the fly requires rewrite rules that unpack the unknown resource identifier into a query. (Location mappings might just swing it in this particular case, but that is not what they are good for.)

The [<c>=parse] natural language parser/generator was adapted to serve this purpose in the predecessors of TermFactory, the 4M and Cogks dialogue systems (see graph below). The cparse generator has a Java/Jena converter that converts between linguistic feature structures and RDF graphs. There is a tool that converts TF ontologies into cparse dictionaries. cparse version 71 was completed in May 2009. Test grammars have been developed for multilingual parsing and generation of concept definitions between Finnish, English, and Chinese.

Show/hide CoGKS architecture

CoGKS architecture

TF natural language semantics (TFSem.owl)

The purpose of the TF semantics namespace is to provide enough semantic analysis of general language to support a simple interlingua suitable for typical terminological definitions which can be parsed from and generated to multiple natural languages. The longer term aim is to reduce or obviate the need to manually verbalise definitions that are built following standard rules of terminology from a concept system formalised in TF.

For instance, the concept ont:Parser could be defined in a TF ontology with superclass ont:Program and with property sem:hasFunction value ont:Parse (which in turn could be dethe fined with superclass ont:Analyze having property sem:Object value ont:Text . This already constitutes an interlingua from which it is not difficult to machine generate simple phrases like a program to analyze text to verbalize the OWL definition. (The language independent parser/generator [<c>=parse] has been tested on this example.)

Compare also the ontology design patterns website.

Verbs

The TF general language semantics builds on the TMAD model of tense, mood, aspect, and diathesis. In this model, verbs denote event types, whose instances are events. Event types are built up from states using a regular algebra. States connect timeless OWL classes to the event ontology: a class (instance) like ont0:Person is related to event type sem:State by bearing role sem:predicateOf to an instance of state sem:Be . An instance of the class carries the role :sem:subjectOf to the state. Reflecting this construction, event aspect splits up in a fourfield of states, processes, changes and cycles. Examples of each are be, breathe, die, blink , respectively. Each event type has roles associated to it. All event types have time and place. States have subject, changes and cycles have source, goal, path, transitives have agent, object and instrument, animates have aims, means, and function.

The English verb be denotes a state. Jesus is savior translates to Jesus sem:subjectOf _:s . _:s rdf:type sem:Be . _:s. hasPredicate ont0:Savior and entails Jesus rdf:type ont:Savior .

Basic OWL class assertions have states in ont:ClassState . They are timeless: their event instances _:c satisfy subjectOf _:c subClassOf predicateOf _:c . An example is Fido is a dog . For class states, sem:subjectOf is effectively a first-order pun of the membership relation rdf:type between an instance and class. For most states, only stages can be projected to class triples. For instance, Jesus was a baby in 1 A.D. , represented by _:e hasTime 1_AD . _:e hasSubject Jesus . _:e hasPredicate Baby . only projects (entails) staged timeless class assertions like Jesus_in_1_AD rdf:type Baby . or Jesus rdf:type Baby_in_1_AD .

The TF event algebra allows the whole gamut of descriptive choices from syntactic event types to the dual view of state based temporal logic models. Which modeling we use may depend which is shortest, and we can mix the views. Recall the ideas of temporal model system as diffs to a model, the duality of objects and events, the duality of events and time, etc.

More event types get constructed with event modals such as causation (transitives) and animacy (agentives). Event type BecomeEvent is a change that shares subject. Event type CauseEvent is a connection of two events one the cause of the other. Event type MakeEvent is the cause-become frame. Event type DoEvent is the frame for animate agency.

Nouns

Some nouns are role nouns, and can be represented by properties. For instance brother is semantically a property ont:brotherOf with inverse ont:hasBrother . Absolute occurrences are existentially quantified as in brotherOf Thing (somebody's) brother. A genitive case or possessive verb with such nouns matches the built-in property value, so that brother of Jesus is just brotherOf Jesus and Jesus has a brother Jakob is jesus hasBrother jakob .

Many nouns have role frames like verbs. For instance road , like its hyperonym passage has the same roles as the verbs pass or go , which the nouns reify: source, path and goal. road to India can be represented as r rdf:type ont:Road . r ont.hasGoal ont1:India .

Many nouns are derivatives of verbs and inherit semantic frame from them. Event nouns like parsing in parsing is hard , means the same as the infinitive in it is hard to parse . Semantically, both the noun and the infinitive can stand for the class pun ont0:Parse or (if a particular event is meant) some particular event instance in the class ont:Parse . An agent noun like leader means one who does the leading, i.e. the sem:doerOf of an event instance in ont:Lead An instrument noun like parser is related by the sem:instrumentOf role to the event type ont:Parse .

Semantic features of nouns like sem:Animate and sem:Human are in TF semantic classes. (OWL/RDF classifications are, after all, rdf:type features.) Such semantic features name categories that commonly get grammaticalised in natural language. For instance, many languages distinguish animate from inanimate and human from non-human vocabulary. For most purposes, sem:Animate could be defined as the domain of the intelligent agent role sem:doerOf and sem:Human as the domain of the social agent role sem:playerOf . Thus the TF category sem:Human does not designate genus Homo but includes organizations and fictional characters. There is no built in semantic feature sem:Abstract in TF schema. Languages rarely grammaticalize abstractness as such (separate from the count/noncount axis). Indeed, it is hard to make an exhaustive split between the concrete and the abstract. Abstraction is a graded and many-dimensional affair rather than a two-way classification. Many concepts have more or less abstract and concrete uses. Abstractness has to do with the type-token distinction, concreteness with space, time and causation. What counts as type and what token varies. TF may leave it to applications to draw a line where needed.

Abstract nouns can be related with metamodeling (punning). For instance, we can say that red is a color with sem0:Red rdf:type sem:Colour and that color is a property with sem0:Colour rdf:type sem:Property . This stays first order, because of the puns sem0:Red, sem0:Colour are not classes, but instances. As Plato would have it, adjective beautiful and abstract noun beauty can both designate the same OWL class sem:Beauty , the difference is grammatical. (Compare the common terminology theory position mentioned above that terminology can get by with just nouns.)

Adjectives

Some adjectives code classes like sem:Female representing female Many more represent comparative relations. A comparative relation like "bigger than" can be coded directly as an object property biggerThan in OWL.

An alternative representation in some ways truer to natural language is the choice function representation, where the positive form of the adjective like "big" is taken as basic. But the adjective is not represented as a class, but as a relation to a comparison class: big is short for big for some class or other. A big animal belongs to class bigFor ont:Animal and a small animal to a disjoint class smallFor ont:Animal . The comparative form "bigger than" becomes a special case of the positive form, where the comparison class consists of two objects.

A third popular way to represent comparative relations is as a measure property, for instance a datatype property ont:length with values in length measures. Measure morphisms can be used with nominal scales too: three valued as in gender m|f|n or two-valued female true|false for classes like sem:Female . The choice or representation depends on the use. Nothing stops using more than one alternative coding. (For background theory see e.g. Krantz et al. 1972)

Some adjectives are more than two-place, so some reification is needed. A preference relation is a many-place relation between a preference subject, a respect of comparison, and at least two objects to compare. OWL directly supports only two-place relations, so we need to curry the arguments somehow. We solve this as follows. We use two disjoint properties ont:goodIn and sem:badIn relating an object to a ranking to represent comparison with choice functions. The comparison class or choice set is an instance r of class ont:Ranking . Property sem:hasRanking maps (animate?) subjects to Rankings that they hold. These constructs allow representing the four-way relation 'subject s prefers x to y in respect r' in binary relations. For instance, the preference "four legs good, two legs bad" is represented by triples r rdf:type ont:Ranking, four_legs ont:goodIn r, two_legs sem:badIn r , and the value judgment assigned to the animal farm f as f sem:hasRanking r .

The subjectivity of the preference is coded by the ranking r, which plays the role of a context in possible worlds semantics. Nobody else but the farm need have just this ranking. An absolute positive sem:Bad can be defined by fixing a ranking. for instance sem1:CommonPlace for rankings common to all people.

There is a long standing grammatical tradition starting from Aristotle's Categories that puts together a number of related notions under the rubric of antonyms (Aristotle's enantia "contrary"). Aristotle noted that antonymy is polysemous. Some cases are easy to do in OWL, like the case of relation converse (inverse) between say greater and less expressible by owl:inverseOf . Some may be less obvious depending on the coding, like the antonymy of good and bad under the choice function representation, which amounts to a owl:DisjointWith of classes Good and Bad relative to a given ranking. Or the antonymy between begin and end, which requires an analysis of the events to constituent states. The difference to just slapping on the traditional label is that the reasoner knows what to infer from the relation. For human consumption, the traditional label may be all one wants.

Many adjectives are derivatives of verbs and inherit semantic frame from them. Some are participles, like missing person meaning person who is missing. Some look just the same but are paronyms from event nouns, like dancing shoes meaning shoes for dancing.

Some adjectives come from nouns. A paronymous adjective like American from America can express generic genitive/possessive case sem:hasRole : the adjective American can mean the same as the genitive America's. The paronymy relation between American and America can be made explicit in TF like this:

<ont:American> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="&sem;relatedTo"/> <owl:hasValue rdf:resource="&ont1;America"/> </owl:Restriction> </rdfs:subClassOf>

This says that the class American is a subclass of those things that are related to the country America. The TF top semantic property sem:relatedTo may also serve as the default semantics for the genitive case and the possessive verb have . What subproperty of being related to is relevant depends on the terms of the relation.

Adverbs

Adpositions are transitive (complemented) adverbs, including pre- and postpositions. A semantic case is an inverse of a semantic role. Pre- and postpositions and morphological cases in inflected languages are used to express semantic cases.

Polysemy

Polysemy is use of words in many related meanings. Less inflecting languages tend to be more polysemous than inflected ones, which prefer explicit paronymy with derivational affixes. In TF one can manage polysemy using an interlingual approach in small scale, so as to minimize the size of the semantic network (OWL graph).

Polysemy includes metonymy, regular or creative shift of meaning from one semantic entity to a different extensionally related one. It is different from homonymy, or the use of the same expression for unrelated meanings. Take as an example the use of the name of a country like Finland between animate (the people) and inanimate (the region), as in Finland fought Russia and Finland borders Russia . If we treat this as a homonymy, we distinguish two (individual) concepts ont1:PeopleOfFinland and ont1:RegionOfFinland , the former in class sem:Animate and the latter in sem:Inanimate , and two homomymous terms en-Finland-N_-_ont1-PeopleOfFinland , en-Finland-N_-_ont1-RegionOfFinland . If we do the same with all country names in all languages, we get a lot of homonyms.

Instead we can use an interlingual approach and treat this as a regular case of metonymy in the semantics. There is a language independent (individual) concept ont1:Finland of type ont:Country . It is related by relations like ont:hasPeople and ont:hasRegion to the concepts PeopleOfFinland and RegionOfFinland , which belong to classes ont:People, sem:Animate and ont:Region, sem:Inanimate , respectively. There is only one term en-Finland-N_-_ont1-Finland pointing to the country. The country is an extensional whole consisting of a region and a people (among other things perhaps). It may or may not make sense to class the whole ( ont1:Finland, ont:Country ) as animate or inanimate, since it has parts in both classes. (But one may feel that countries are primarily regions.) Finland is small can be true or false depending on which class is used in the ranking. It is left to external processing to resolve the polysemy in context.

For another example, Finnish distinguishes between a verb valmistaa 'prepare', event noun valmistus 'preparation' and product noun valmiste 'preparate'. Assume the event and the product need separating to different concepts, say PrepareEvent and PrepareProduct (the latter related to the former as productOf PrepareEvent ). We need three signs relating the three forms to the two concepts. The English family prepare/preparation/preparate is similar, except preparation has process/product polysemy. If this is a common occurrence, create a superclass Prepare relating the two senses PrepareEvent and PrepareProduct and make preparation denote that.

The English word parse can be a verb or a noun denoting the event of parsing. (This common type of ambiguity is known as conversion in English grammar). Instead of creating two signs for each part of speech and meaning, make just one form en_parse_NV designate one concept ont:Parse , leaving the polysemy to semantics. Split ont:Parse to ont:ParseEvent and ont:ParseProduct to accommodate Finnish jäsentää/jäsennys where the jäsentää is the verb and the jäsennys is a noun polysemous between event and product senses.

Top ontologies

Top ontologies

This section relates the TF schema TFS.owl to available large general purpose ontologies. Especially with large ongoing ontology efforts, it does not make sense to convert data to TF. That would undermine the whole idea of the Semantic Web. Rather than take a huge snapshot doomed to obsolescence, the more sensible solution is to use the original URIs as is and bridge them to TF,

WordNet

WordNet is a large open source lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. WordNet has been converted by others to RDF/OWL more than once. We bridge WordNet to TF rather than do yet another version. See section on WordNet conversion .

Wordnet 3.0 OWL has almost half a million URIs. FinWordnet adds another couple of hundred thousand. This is more than twice the number of word senses in WordNet because synsets and senses have their own URIs in OWL. (WordNet has between one and two senses per synset in the average). We plan is to use TermFactory as a platform in a crowdsourcing effort to check the Finnish translation of WordNet.

WordNet consists of about 100K synsets. For editing the English-Finnish WordNet, TF splits the synsets into entry size files for English terms, Finnish terms, and (English) relationships. An entry is collected together from these pieces on demand with a php entry generator script, shown below. A url of form wn30entry?e=synset-entity-noun-1 generates the wordnet entry for synset-entity-noun-1 .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix meta: <http://tfs.cc/meta/> . @prefix Wn30: <http://tfs.cc/wn30/> . @prefix Wn30en: <http://tfs.cc/wn30/en/> . @prefix Wn30enh: <http://tfs.cc/wn30/en/h/> . @prefix Wn30fi: <http://tfs.cc/wn30/fi/> . [] rdf:type meta:Entry ; meta:active Wn30fi:fi-<?=$_GET['e']?>.ttl ; meta:schema <http://tfs.cc/owl/wn/TFwn.owl> ; meta:template <file:etc/templates/sem.ttl> . Wn30:<?=$_GET['e']?>.ttl rdf:type owl:Ontology ; owl:imports Wn30en:en-<?=$_GET['e']?>.ttl ; owl:imports Wn30enh:enh-<?=$_GET['e']?>.ttl ; owl:imports Wn30fi:fi-<?=$_GET['e']?>.ttl ; owl:versionInfo "TF WordNet version 0.2 21.01.2012" .

The current TF WordNet conversion uses the w3c namespace URIs.

SUMO

The OWL version of the Suggested Upper Model Ontology SUMO (SUMO.owl) from Teknowledge (version 1.5.?) contains around 10K classes at the top. The full ontology with instances contains about 250K URIs, including names of airports, languages, cities, etc.

SUMO is an encyclopaedic ontology, while TF is geared toward terminology and natural language semantics. That is to say, TF semantic classes (like sem:Causer, sem:Doer ) name natural language semantic roles. They are evaluated by their ability to predict word choices. The TF schema is thinner and coarser than SUMO. Top classes in ontologies are particularly hard to match because more populous classes allow more splits. Often the best bet is to leave the uppermost topology alone and bridge lower down. SUMO has been tentatively bridged with TF in the bridge ontology SUMO2TFS.owl .

FinnOnto

The Finnish national FinnOnto project has built a sizable thesaurus ontology collection available at ONKI . Parts of it have been translated into English and Swedish. TF bridges to ONKI in the TF domain classification ontology TFDom.owl .

TF localization

TF localisation

TF classes and properties like concept, term, and expression are also described in TF as (instances of) concepts, terms, and expressions. This means that TF is capable of reflection : it can document and localise itself. TF schema localisation ontology TFL10n.owl provides definitions and translations of TFS descriptive classes and properties. This information is used to change language in the TF front end tools. Note how the property names in the example below have been partially localized into Finnish (the coverage depends on the localization ontology). This is not interface localization, but content localization (if the dichotomy makes sense at all here).

Show/hide localized WordNet entry

The localization was done with the following command line:

rewrite XHTML lang=fi locals=../owl/wn/TFwnL10n-fi.owl schema=../owl/wn/TFwn.owl ../io/entity.ttl < entity-l10n.xhtml

The TF to XHTML writer and its converse, the XHTML to TF parser are parametrised with a localisation model and lang code. Given these parameters, a TF model is serialised in XHTML with property and value URIs labeled with strings taken from the localisation model and language. Conversely, when an edited XHTML document is parsed back into TF, labels used in the XHTML document are mapped back to TF URIs by looking up corresponding localisation terms from the localisation model.

TF JSON content localization files

The localization file to the XHTML writer/reader can also be a json format file as produced by a Perl + SPARQL script localize below. The localization file's full URL can be given as a parameter. The json file extension must be .json . A location for localization files can be specified with TF option TF_LOCALS in etc/tf.properties . The default location is where the ontologies are, file:owl/ . If a bare relative path of form something.json is explicitly given, it is resolved against the value of TF_LOCALS . If no localisation file is given, then the value of TF_LOCALS is used as the URL. If this URL is a directory and localization lang is (say) fi, then filename fi.json is resolved against it.

JSON localization files can be generated from localization ontologies with the help of Perl script io/script/localize . It uses the localization script template io/sparql/locals.sparql. The format of TF json localization files is the json result set format defined in SPARQL 1.1 recommendation, produced by the Jena ARQ ResultSetFormatter.outputAsJSON from a SPARQL SELECT query. Here is an example abbreviated from the result of running localize on the TF Finnish localization ontology http://tfs.cc/TF/owl/TFL10n-fi.owl . The top level keys are "prefixes", "head", and "results". Prefixes are namespace prefixes from the localization ontology. Head is the sequence of the SELECT query variables. Results are a set of "bindings", and each binding binds some of the query variables, giving he RDF type of the variable and the the actual value of the variable.

The json localization source file format is multilingual. There is a minimized monolingual json localization file format compiled from the source format that is used in TF front end localization .

{ "prefixes": { "": "http://tfs.cc/TF/owl/TFL10n-fi.owl/", "exp0": "http://tfs.cc/exp0/", ... } , "head": { "vars": [ "inst" , "lang" , "base" ] } , "results": { "bindings": [ { "inst": { "type": "uri" , "value": "http://tfs.cc/exp0/Chinese" } , } , { "inst": { "type": "uri" , "value": "http://tfs.cc/exp0/English" } , "lang": { "type": "literal" , "value": "fi" } , "base": { "type": "literal" , "value": "englanti" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/exp0/Finnish" } , "lang": { "type": "literal" , "value": "fi" } , "base": { "type": "literal" , "value": "suomi" } } , ... ] } }

The localize script

The localize script is a perl script wrapper to a pellet4tf SPARQL query. It accepts optional parameters for localization language, the type of entries to localize, the source ontology to use, and the output format. The following example produces a localization file in JSON for property names in Finnish using a localization ontology on http://tfs.cc . The defaults for the options and the command run by the script are shown on the second example. Warning: the script creates a temporary file by name tmp.sparql in the directory where it is run. Examples:

> localize -h # write localization file for given LANG and TYPE # usage: localize -l LANG -t TYPE -e engine -X axioms -o format -q script file... # example: localize -l fi -t meta:Property # default: localize # localizes TF schema vocabulary # help: localize -h > localize -l fi -t meta:Property -o JSON http://tfs.cc/owl/TFL10n-fi.owl localize TF properties in Finnish > localize -t bio:Country -e Stacked -X biobridge.ttl biocaster.ttl

Option -h prints usage and exits. The plain script name without arguments localizes the TermFactory vocabulary in TFS.owl. Option -l LANG only localizes to LANG. Option -t TYPE only localizes instances of TYPE. Option -e chooses the Pellet4TF engine to use. Option -X sets axioms for the Pellet4TF Stacked engine. Option -q chooses the localization script to use (the default is $TF_HOME/io/sparql/locals.sparql; an alternative one $TF_HOME/io/sparql/translations.sparql matches the TFTop schema). The remaining arguments are localization ontologies (the default is $TF_HOME/owl/TFL10n.owl).

Another script script/localize2 completes a given TermFactory json localization table for a given language using a localization ontology and a schema ontology. The script reads in a partially completed localization table as produced by localize , and uses the localization and schema ontologies to fill out the gaps, looking for localization strings for those concepts that have no label in the table. The script looks for translations in the localization ontology using sign properties defined in the schema ontology. If there is no exact label for a given concept in the localization ontology, the script looks for an as yet unused label for one of its superclasses. For instance, if an equivalent for "black adder" is missing for a given language, it may propose the translation for "viper". Examples:

> localize2 -h # use TFLocalizer to fill out json localization file for given lang schema and localization ontology # usage: localize2 -g -s schema -L l10n -l lang *.json # example: localize2 -s TFTop.owl -L TFL10n.owl -l fi ../io/fi0.json # help: localize2 -h > cat fi.json | tail { "inst": { "type": "uri" , "value": "http://tfs.cc/exp0/Mandarin" } , "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string" , "type": "typed-literal" , "value": "fi" } , } , { "inst": { "type": "uri" , "value": "http://tfs.cc/exp0/Taiwanese" } , "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string" , "type": "typed-literal" , "value": "fi" } , } ] } } > localize2 -s TFTop.owl -L TFL10n.owl -l fi ../io/fi.json | tail ... { "inst": { "type": "uri", "value": "http://tfs.cc/exp0/Mandarin" }, "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "typed-literal", "value": "fi" }, "base": "kiina", "hyper": "http://tfs.cc/exp/Chinese" }, { "inst": { "type": "uri", "value": "http://tfs.cc/exp0/Taiwanese" }, "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "typed-literal", "value": "fi" }, "base": "kiina", "hyper": "http://tfs.cc/exp/Chinese" } ]} }

Here Mandarin and Taiwanese do not have a direct Finnish equivalent in the localization ontology TFL10n.owl) but both get localized by its hypernym kiina 'Chinese'. Setting switch -g ('g' for 'global') prevents using the same hypernym twice, which guarantees unique globalization but may not fill as many gaps.

Localization queries

TF also provides a dynamic SPARQL script io/script/localize.php that helps replicate the functionality of the offline localize script through the TF web API. When requested from a web server, it dynamically generates a localization query for given language and class parameters. (Without the parameters, it generates a query that matches all instances and languages in the data). Here is an example url and the query it generates.

http://localhost/localize.php?l=fi # localize.php - localize TF instances # do not modify (comment lines significant) PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX ont: <http://tfs.cc/ont/> PREFIX term: <http://tfs.cc/term/> PREFIX meta: <http://tfs.cc/meta/> PREFIX exp: <http://tfs.cc/exp/> PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#> SELECT DISTINCT ?inst ?lang ?base WHERE { # ?inst rdf:type ?TYPE . FILTER(?lang = "fi") . OPTIONAL { ?inst term:referentOf ?term . ?term term:hasDesignation ?exp . ?exp exp:langCode ?lang . ?exp exp:baseForm ?base . } OPTIONAL { ?term term:hasReferent ?inst . ?term term:hasDesignation ?exp . ?exp exp:langCode ?lang . ?exp exp:baseForm ?base . } } ORDER BY ASC(?inst) ASC(?lang) ASC(?base)

The url of the localize.php script with desired query parameters can then be fed to the TF QueryForm servlet and run against some specified localization repos. Here is an example query url for such a TF localization query. The QueryForm query string sets four parameters here: the localize.php url (q), the repos to query as data (r), the query engine to use (e), and the SELECT query output format (f2). Note that the localize.php url has been percent encoded to avoid ambiguity. The result of the query should be identical to the result of the commandline script localize -l fi assuming the queried repository and engine are the same.

http://localhost:8080/TermFactory/query?q=http%3A%2F%2Flocalhost%2Flocalize.php%3Fl%3Dfi&r=http%3A%2F%2Ftfs.cc%2Fowl%2FTFL10n-fi.owl&e=MIXED&f2=JSON
How localization works

Localization in TF is nothing special, it is just another term lookup using SPARQL. Classes that are used in TF to describe terms and expressions, like ont:Concept , exp:Expression , term:Term , exp:Noun , are in no way different from any others for which TF provides multilingual terms and expressions. Such terms are associated to these classes as properties of their respective representative instances (concepts, i.e. class puns). It is straightforward to write sparql queries that localise these - or any other - terms in the language(s) of interest.

For OWL object properties, like meta:hasDomain , term:seeFalseFriend , and data properties like exp:baseForm , a link to localising vocabulary has to be provided in a different (but analogous) way. For OWL 2, this is not difficult because OWL 2 allows punning classes and properties, so that a class or property can be metamodeled (given first order properties) with a homonymous instance. For backward compatibility with OWL 1, we may want to keep the TF convention to pun a property with a paronymous instance that has the same local name as the property but a minimally different, conventionally related namespace prefix. For instance, meta:hasDomain is punned with instance meta0:hasDomain . The pun is then localised as before. For efficiency, property puns are classified just the same way as the corresponding properties.

It is straightforward to write a sparql query that fetches for any given object (class, property, or instance) its localised names in different languages. Here is an example.

lcarlson@lhc:~/Data/CF/TF/io/ pellet query -q ploc.sparql -d ../owl/TFS.owl Query: # ploc.sparql - localize names of TF properties PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX ont: <http://tfs.cc/ont/> PREFIX term: <http://tfs.cc/term/> PREFIX meta: <http://tfs.cc/meta/> PREFIX exp: <http://tfs.cc/exp/> PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#> SELECT DISTINCT ?prop ?lang ?base WHERE { ?prop rdf:type meta:Property . OPTIONAL { ?prop term:referentOf ?term . ?term term:hasDesignation ?exp . ?exp exp:langCode ?lang . ?exp exp:baseForm ?base . } } ORDER BY ASC(afn:localname(?prop)) # ASC(?lang) ASC(?base) ----------------------------------------------------- Query Results (32 answers): prop | lang | base ============================================================================================================================= baseForm | "fi"^^http://www.w3.org/2001/XMLSchema#string | "perusmuoto"^^http://www.w3.org/2001/XMLSchema#string catCode | "fi"^^http://www.w3.org/2001/XMLSchema#string | "sanaluokka"^^http://www.w3.org/2001/XMLSchema#string ctryCode | "fi"^^http://www.w3.org/2001/XMLSchema#string | "maatunnus"^^http://www.w3.org/2001/XMLSchema#string frequency | "fi"^^http://www.w3.org/2001/XMLSchema#string | "taajuus"^^http://www.w3.org/2001/XMLSchema#string gender | "fi"^^http://www.w3.org/2001/XMLSchema#string | "suku"^^http://www.w3.org/2001/XMLSchema#string hasFunction | "en"^^http://www.w3.org/2001/XMLSchema#string | "for"^^http://www.w3.org/2001/XMLSchema#string hasGoal | "fi"^^http://www.w3.org/2001/XMLSchema#string | "kohde"^^http://www.w3.org/2001/XMLSchema#string hasGoal | "zh"^^http://www.w3.org/2001/XMLSchema#string | "为了"^^http://www.w3.org/2001/XMLSchema#string hasHead | "en"^^http://www.w3.org/2001/XMLSchema#string | "headword"^^http://www.w3.org/2001/XMLSchema#string hasHead | "fi"^^http://www.w3.org/2001/XMLSchema#string | "edussana"^^http://www.w3.org/2001/XMLSchema#string hasInstrument | "en"^^http://www.w3.org/2001/XMLSchema#string | "in"^^http://www.w3.org/2001/XMLSchema#string hasInstrument | "fi"^^http://www.w3.org/2001/XMLSchema#string | "väline"^^http://www.w3.org/2001/XMLSchema#string hasMeans | "fi"^^http://www.w3.org/2001/XMLSchema#string | "keino"^^http://www.w3.org/2001/XMLSchema#string hasPart | "en"^^http://www.w3.org/2001/XMLSchema#string | "part"^^http://www.w3.org/2001/XMLSchema#string hasPart | "fi"^^http://www.w3.org/2001/XMLSchema#string | "osa"^^http://www.w3.org/2001/XMLSchema#string hasPlace | "en"^^http://www.w3.org/2001/XMLSchema#string | "in"^^http://www.w3.org/2001/XMLSchema#string hasPlace | "fi"^^http://www.w3.org/2001/XMLSchema#string | "paikka"^^http://www.w3.org/2001/XMLSchema#string hasPlace | "zh"^^http://www.w3.org/2001/XMLSchema#string | "在___里"^^http://www.w3.org/2001/XMLSchema#string hasReferent | "fi"^^http://www.w3.org/2001/XMLSchema#string | "tarkoite"^^http://www.w3.org/2001/XMLSchema#string hasTime | "fi"^^http://www.w3.org/2001/XMLSchema#string | "aika"^^http://www.w3.org/2001/XMLSchema#string headPosition | "fi"^^http://www.w3.org/2001/XMLSchema#string | "edussanan paikka"^^http://www.w3.org/2001/XMLSchema#string hidden | "fi"^^http://www.w3.org/2001/XMLSchema#string | "piilotettu"^^http://www.w3.org/2001/XMLSchema#string langCode | "fi"^^http://www.w3.org/2001/XMLSchema#string | "kielitunnus"^^http://www.w3.org/2001/XMLSchema#string number | "fi"^^http://www.w3.org/2001/XMLSchema#string | "luku"^^http://www.w3.org/2001/XMLSchema#string register | "fi"^^http://www.w3.org/2001/XMLSchema#string | "rekisteri"^^http://www.w3.org/2001/XMLSchema#string romanisation | "fi"^^http://www.w3.org/2001/XMLSchema#string | "latinalaistus"^^http://www.w3.org/2001/XMLSchema#string see | "fi"^^http://www.w3.org/2001/XMLSchema#string | "katso"^^http://www.w3.org/2001/XMLSchema#string seeFalseFriend | "fi"^^http://www.w3.org/2001/XMLSchema#string | "petollinen ystävä"^^http://www.w3.org/2001/XMLSchema#string status | <<null>> | <<null>> text | <<null>> | <<null>> usage | "fi"^^http://www.w3.org/2001/XMLSchema#string | "käyttö"^^http://www.w3.org/2001/XMLSchema#string valuation | "fi"^^http://www.w3.org/2001/XMLSchema#string | "arvotus"^^http://www.w3.org/2001/XMLSchema#string lcarlson@lhc:~/Data/CF/TF/io/

String datatype values (values of TF picklist properties) can be self-documented in a similar way. TF creates an assocation between the picklist property exp:langCode , its string value the language code en , and the class exp:English that lets TF self-document language codes as a regular TF concept/term/expression association. The first step is to construe the string value en of the property exp:langCode as the base form of an expression exp1:ISO-en-N designating term term1:ISO-631-1-en . This term is classed as an abbreviation (member of term:Abbreviation ) whose referent is the concept exp0:English , the English language. (The language code of a term is usually that of its expression, but not necessarily always. English terms may be built on expressions borrowed from foreign languages. They may not mean the same in English as they do in the loan language.) Here is the entry for the language code "en" in TF. It expresses that "en" is the base form of the ISO two-letter language code that designates English.

term1:ISO-631-1-en term:hasDesignation exp1:ISO-en-N . exp1:ISO-en-N exp:baseForm "en" . term1:ISO-631-1-en term:hasReferent exp0:English .

All further properties of ont0:English can be accessed from the language code through this chain of associations, including designations and descriptions in different languages. An example documentation query is shown below. Other picklist properties can be dealt with analogously. But literal values are not really meant to be localized. They are what the name says, literal. If a picklist property needs localizing, better make the property an object property and its values TF resources chosen from a short list.

pellet query -q vloc.sparql -d ../owl/TFS.owl Query: # vloc2.sparql - localize values of TF properties PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX ont: <http://tfs.cc/ont/> PREFIX term: <http://tfs.cc/term/> PREFIX meta: <http://tfs.cc/meta/> PREFIX exp: <http://tfs.cc/exp/> PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#> SELECT DISTINCT ?prop ?code ?obj ?lang ?base WHERE { ?prop rdfs:subPropertyOf meta:picklistProperty . FILTER ( ?prop != meta:picklistProperty ) . ?inst ?prop ?code . OPTIONAL { ?exp0 exp:baseForm ?code . ?term0 term:hasDesignation ?exp0 . ?term0 rdf:type term:Abbreviation . ?term0 term:hasReferent ?obj . ?obj term:referentOf ?term . ?term term:hasDesignation ?exp . FILTER ( ?exp != ?exp0 ) . ?exp exp:langCode ?lang . ?exp exp:baseForm ?base . } } ORDER BY ASC(afn:localname(?prop)) # ASC(?lang) ASC(?base) ----------------------------------------------------- Query Results (18 answers): prop | code | obj | lang | base ============================================================================================================================================================================== catCode | "A"^^http://www.w3.org/2001/XMLSchema#string | Adjective | "en"^^http://www.w3.org/2001/XMLSchema#string | "adjective"^^http://www.w3.org/2001/XMLSchema#string catCode | "P"^^http://www.w3.org/2001/XMLSchema#string | Adposition | "en"^^http://www.w3.org/2001/XMLSchema#string | "adposition"^^http://www.w3.org/2001/XMLSchema#string catCode | "N"^^http://www.w3.org/2001/XMLSchema#string | Noun | "en"^^http://www.w3.org/2001/XMLSchema#string | "noun"^^http://www.w3.org/2001/XMLSchema#string catCode | "V"^^http://www.w3.org/2001/XMLSchema#string | Verb | "en"^^http://www.w3.org/2001/XMLSchema#string | "verb"^^http://www.w3.org/2001/XMLSchema#string ctryCode | "CN"^^http://www.w3.org/2001/XMLSchema#string | China | "en"^^http://www.w3.org/2001/XMLSchema#string | "China"^^http://www.w3.org/2001/XMLSchema#string ctryCode | "FI"^^http://www.w3.org/2001/XMLSchema#string | Finland | "en"^^http://www.w3.org/2001/XMLSchema#string | "Finland"^^http://www.w3.org/2001/XMLSchema#string ctryCode | "FI"^^http://www.w3.org/2001/XMLSchema#string | Finland | "fi"^^http://www.w3.org/2001/XMLSchema#string | "Suomi"^^http://www.w3.org/2001/XMLSchema#string langCode | "en"^^http://www.w3.org/2001/XMLSchema#string | English | "en"^^http://www.w3.org/2001/XMLSchema#string | "English"^^http://www.w3.org/2001/XMLSchema#string langCode | "en"^^http://www.w3.org/2001/XMLSchema#string | English | "fi"^^http://www.w3.org/2001/XMLSchema#string | "englanti"^^http://www.w3.org/2001/XMLSchema#string langCode | "en"^^http://www.w3.org/2001/XMLSchema#string | English | "zh"^^http://www.w3.org/2001/XMLSchema#string | "英语"^^http://www.w3.org/2001/XMLSchema#string langCode | "zh"^^http://www.w3.org/2001/XMLSchema#string | Chinese | "en"^^http://www.w3.org/2001/XMLSchema#string | "Chinese"^^http://www.w3.org/2001/XMLSchema#string langCode | "zh"^^http://www.w3.org/2001/XMLSchema#string | Chinese | "fi"^^http://www.w3.org/2001/XMLSchema#string | "kiina"^^http://www.w3.org/2001/XMLSchema#string langCode | "zh"^^http://www.w3.org/2001/XMLSchema#string | Chinese | "zh"^^http://www.w3.org/2001/XMLSchema#string | "中文"^^http://www.w3.org/2001/XMLSchema#string langCode | "fi"^^http://www.w3.org/2001/XMLSchema#string | Finnish | "en"^^http://www.w3.org/2001/XMLSchema#string | "Finnish"^^http://www.w3.org/2001/XMLSchema#string langCode | "fi"^^http://www.w3.org/2001/XMLSchema#string | Finnish | "fi"^^http://www.w3.org/2001/XMLSchema#string | "suomi"^^http://www.w3.org/2001/XMLSchema#string langCode | "fi"^^http://www.w3.org/2001/XMLSchema#string | Finnish | "zh"^^http://www.w3.org/2001/XMLSchema#string | "芬兰语"^^http://www.w3.org/2001/XMLSchema#string langCode | "iso"^^http://www.w3.org/2001/XMLSchema#string | <<null>> | <<null>> | <<null>> langCode | "tfs"^^http://www.w3.org/2001/XMLSchema#string | <<null>> | <<null>> | <<null>> lcarlson@lhc:~/Data/CF/TF/io/

The TF localisation vocabulary is kept in a separate ontology document TFL10n.owl which imports TFS.owl . The localisation document also contains property type information used by the TF front end. The following query reports types of selected concept, term, and expression properties.

pellet4tf query -q ptype.sparql file:../owl/TFL10n.owl Query Results (17 answers): prop | type | type2 ============================================================ ctryCode | ConceptDataProperty | PicklistProperty referentOf | ConceptObjectProperty | <<null>> baseForm | ExpressionDataProperty | <<null>> catCode | ExpressionDataProperty | PicklistProperty gender | ExpressionDataProperty | PicklistProperty headPosition | ExpressionDataProperty | <<null>> langCode | ExpressionDataProperty | PicklistProperty number | ExpressionDataProperty | PicklistProperty romanisation | ExpressionDataProperty | <<null>> text | ExpressionDataProperty | <<null>> hasHead | ExpressionObjectProperty | <<null>> register | TermDataProperty | PicklistProperty status | TermDataProperty | PicklistProperty usage | TermDataProperty | PicklistProperty valuation | TermDataProperty | PicklistProperty hasReferent | TermObjectProperty | <<null>> seeFalseFriend | TermObjectProperty | <<null>>

Classes and their subclasses can be reported in tree form using pellet4tf classify or in tabular form using sparql query child.sparql . Class names can be localised using query cloc.sparql .

How to add localizations

There are many ways to find and edit concepts which have not got localizations in a given language. The following scripts are provided in file:io/sparql as a starter.

locals.sparql select existing localizations in all languages
nolocals-fi.sparql select missing localizations in Finnish
describe-unlocalized-fi.sparql describe concepts missing localization in Finnish
construct-unlocalized-fi.sparql construct templates for missing localizations in Finnish

locals.sparql is the query used by command line script localize. It produces all the localizations found in the dataset. It can also be run through the query form for instance. nolocals-fi.sparql is its complement for Finnish. It produces a table of concepts missing localizations in Finnish. With JSON output setting, it produces a JSON table that can be read into the TF/MOLTO equivalents editor. (It needs the ARQ projection syntax extension of SPARQL, see here). describe-unlocalized-fi.sparql runs a TF DESCRIBE query on concepts which have no Finnish localization. It shows among other things their localizations in other languages. It does not create templates for the missing equivalents. Templates can be created individually in the editor using the editor TF Inseret menu's input templates. construct-unlocalized-fi.sparql creates templates for missing Finnish localizations. Running first the describe script and then the construct script on the result with the accumulation (-M) option set produces both.

pellet4tf query -e MIXED -q sparql/describe-unlocalized-fi.sparql ../owl/TFL10n.owl > step1.ttl time pellet4tf query -o XHTML -R ont:Concept -e MIXED -q sparql/construct-unlocalized-fi.sparql -M step1.ttl ../owl/TFL10n.owl > step2.xhtml

Show/hide step2.xhtml

To avoid duplication of work, one should first check if there are candidates for localization already elsewhere in TF before submitting new proposals.

Ontologies in the Web

Much of the complexity of TF reflects the complexity of distributed management of large ontologies in the Web. The tactics is one of divide and conquer: divide large ontologies into manageable size pieces, reduce complexity using reasoning, inheritance, redirection, and whatnot.

Reasoning in TF

Reasoning can reduce the traveling size of an ontology by letting it assert much less than it implies. It is possible to define for a given TF term ontology, a notion of a minimal set of axioms from which all statements of the ontology can be derived, and from which no axiom can be removed without losing statements in the theory. An axiom set is not generally unique for a given theory; there can be many axiom sets of the same smallest size. At another end, there is the notion of materialization , or inferential closure of the ontology, which contains all the (non-tautological) statements (in the ontology's vocabulary) derivable from the axiom set. This is unique for an ontology, but not necessarily finite. The closure of a TF term ontology is a finite (though possibly large) set. An axiom set for a term ontology is useful to have for editing and versioning because the set is minimal and edits only need to be done in one place. A (partial) materialization can be useful for viewing and querying because it allows faster access to implicit content than online reasoning - provided the closure is not prohibitively large. (cf Ontotext on reasoning strategies.)

The difference between statements, axioms, and theorems must be observed when editing an ontology. In TF, editing is a syntactic operation on a set of triples. Statements, not facts, are edited. This means that a fact may stay in an ontology although a triple gets deleted, if it is a theorem from some axioms that remain. Deleting facts would not be just more complex, but would bring in tricky problems of nonmonotonic reasoning or belief revision.

Property inheritance in OWL

One of the attractions of ontologies is property inheritance. Instead of having to assign properties to each instance (say a term or expression) individually, one assigns shared properties to classes, whence they get inherited to class instances. In description logic, inheritance becomes classical logical entailment. OWL allows expressing some property inheritance by means of defining classes by description.

Take as an example the ontology of languages and language codes in TF. Class ont:Language contains concepts (puns) for individual languages like exp0:English, which reifies the language adjective exp:English. exp:English denotes the set of all pieces of the English language, including English terms and expressions, that have language code "en". The language code "en" in turn is the base form of a term (an abbreviation) that refers to English. Terms and expressions that belong to English should have the language code "en". Having "en" as language code means being English. Intuitively, then, the first triple below should be equivalent to the second:

[] exp:langCode "en" . [] rdf:type exp:English .

Indeed, this equivalence can be expressed in OWL as follows.

<owl:Class rdf:about="&exp;English"> <owl:equivalentClass> <owl:Restriction> <owl:onProperty rdf:resource="&exp;langCode"/> <owl:hasValue rdf:datatype="&xsd;string">en</owl:hasValue> </owl:Restriction> </owl:equivalentClass> <rdfs:subClassOf rdf:resource="&exp;Language"/> <rdfs:comment rdf:datatype="&xsd;string">anything in the English language</rdfs:comment> </owl:Class>

Here the element owl:Restriction defines a class by description, namely, the class of those things which have language code 'en'. Any instance that belongs to this class has that language code. Or that is what the axiom says. It is up to a reasoner to enforce this classification. (OWL restrictions tend to look ugly, so it may be convenient to invent a class whose main job is to entail restrictions. Then the restrictions are easily imposed by just subclassing from that class. exp:English above is an example - its instances inherit language code 'en'.)

A theory (a set of statements) in two-variable first order logic with equality is decidable for satisfiability (Mortimer). A canonical model of a theory is a maximal consistent extension of the theory which constitutes a model for the theory (satisfies the theory). A theory is (finitely) satisfiable if has a (finite) canonical model. Not all ontologies have finite models. OWL allows expressing ontologies which have only infinite models. Transitive functional properties can cause this. So can two independent transitive properties. A language (schema) all of whose theories (instances) are finitely satisfiable is said to have finite model property (fmp). Finite model property implies decidability. A TF ontology consists of a concept ontology and a term ontology. A concept ontology can be any OWL theory, but a TF term ontology is finitely satisfiable: the TFS schema has no existential axioms which could lose fmp. A TFS.owl term ontology has a finite inferential closure (materialization). Its size relative to the axiom set depends on the constructs used in the term ontology. (For a survey of description logic complexity results see Zolin .)

Here is an example of a sample expression ontology as an axiom set and its theorems. The theorems were generated with pellet extract . The closure of is the axioms plus the theorems. Theorems can be checked against the axiom set with with pellet entail . The TF tool pellet4tf can be used to extend an ontology with entailments relative to a given set of axioms.

Show/hide TF axioms

Show/hide TF theorems

Rules

Different varieties of rules have been proposed as extensions of description logic. Besides extending classical reasoning power past the confines of decidable description logic, some types of rules can do nonmonotone reasoning, signature transformation and refactoring (rewriting the vocabulary of resource names and literals). Because rules can express more, they are less well behaved than descriptions. For the same reason, there is less agreement about what rules should be like. There is no W3C recommendation for a rule language, just a W3C recommendation for a rule interchange format RIF . A well known Semantic Web rule language is SWRL. SWRL is partially supported by Pellet .

Property inheritance with SWRL

SWRL is a w3c submission for a Semantic Web rule language. The proposal extends the set of OWL axioms with if-then rules. It thus enables such rules to be included in an OWL knowledge base. The proposed rules are of the form of an implication between an antecedent (body) and consequent (head). The intended meaning can be read as: whenever the conditions specified in the body hold, then the conditions specified in the head must also hold.

The property inheritance method of the previous section covers inheritance of properties from a class to its instances. For inheritance of properties over other properties, rules are needed. The following rule (in ad hoc human readable form) expresses a source inheritance rule: the sources of a term include the sources of its designation.

Term(term) ^ hasDesignation(term,exp) ^ Expression(exp) ^ hasSource(exp,src) ^ Source(src) -> hasSource(term,src) .

Pellet supports reasoning with SWRL rules. A file that contains SWRL rules is loaded into Pellet and rules are parsed and processed. SWRL rules can be mixed with OWL axioms in an ontology. Rules will be applied to only named individuals in the ontology. Pellet supports all SWRL atoms described in the specification. SWRL builtins include among other things string processing primitives (substring search, concatenation etc.), so it is possible (though tedious) to create URIs with rules. As of Pellet 2.0, Pellet supports nearly all of the SWRL builtins .

Role maps

OWL 1 can express inheritance of classes: property slots and properties with fixed value. It cannot express inheritance of property triples between individuals (that takes three variables: if x has property z then y has z). For instance, conventionally, in TBX the scope of a source indication depends on its place in the tree. The interpretive convention is that the source indication gets inherited down the XML tree from the entry to terms in it. OWL 1 cannot make this inheritance convention explicit. It has to be expressed in other ways, for instance with queries or rules. (Compare section on rules and discussion .)

SWRL is undecidable since it can encode arbitrary role maps. Role maps are inclusions between role chains, for example a father's brother is an uncle . Role maps as a whole are a well-known undecidable class. Inheriting a property (e.g. hasSource) along another property (e.g. designationOf) can be formulated as a role map: hasSource includes designationOf hasSource. It turns out that with appropriate restrictions, this particular case is tractable. Decidability can be preserved by restricting expressivity to acyclic role inclusion axioms. These are sufficient for expressing property inheritance. An OWL 2 construct (complex property inclusion with an ObjectPropertyChain in a SubObjectPropertyOf axiom) codes this. See Horrocks/Sattler 2002 . Compare also OWL 2 Rules and this presentation.

The following sample SWRL in Turtle format exemplifies the use of SWRL rules to express the inheritance of source indications from expressions to terms:

Show/hide TF rules

The rule makes the example term inherit the source indication of its designation, as show by the following pellet test run. The clou of the example is that term en-example-N_-_exp:Example has inherited a source from expression en-example-N .

pellet realize TFRules.owl owl:Thing exp:Expression - (exp1:en-example-N) meta:HasSource - (en-example-N_-_exp:Example, exp1:en-example-N) meta:Source - (meta1:LCarlson) term:Term - (en-example-N_-_exp:Example)

OWL 2 allows defining compositions of object properties as follows:

<!-- http://tfs.cc/term/designates --> <owl:ObjectProperty rdf:about="&term;designates"> <owl:propertyChainAxiom rdf:parseType="Collection"> <rdf:Description rdf:about="&term;designationOf"/> <rdf:Description rdf:about="&term;hasReferent"/> </owl:propertyChainAxiom> </owl:ObjectProperty>

The relation term:designatedBy holds between a concept and an expression when there is a term whose referent is the concept and whose designation is the expression. The same technique can be used to define property inheritance, for instance, the property meta:hasSource may be made to include the composition term:hasDefinition o meta:hasSource .

TF supports limited ontology transformation using SPARQL CONSTRUCT queries with pellet4tf query -M and -m options. The TF rewrite utility can do refactoring of TF URIs. Seee also SPARUL in the next section.

Querying as reasoning

A query language expresses questions about a dataset (set of models). The answer may be yes/no, a list of bindings of answer values to variables (question words) in the query, or another set of triples. The answer to a query is computed by a query engine. Querying is one if not the main way of accessing knowledge stored in a TF repository. The Web of Data concept builds on a network of linked data consisting of machine readable query endpoints parallel to the current network of human readable websites.

In Hintikka's logic of questions, an answer to a question is set of formulas in epistemic logic that entails the question, expressed as another epistemic formula. Answers we won and we did not win to question did we win (or not) . The logic is that the answer formula (I know that) we won entails the question formula I know that we won or I know that we did not win . Since the answer matches the question, it can be abbreviated to yes or no.

Question words are quantifiers into an epistemic context. I know what something is when there is something I know it to be. When the form of the answer reflects that of the question, an answer to a question can be abbreviated to a list of (bindings of) values to question words. So one can answer Who won WW2? with USA, Britain, ... , instead of USA won WW2 and Britain won WW2 and ... . This type of answer to a query constitutes a set of bindings of values to question variables to produce instances of the query in the data. But note that Hintikka's definition of answerhood also covers indirect answers, where inference intervenes between the question and the answer, or between the answer and the dataset.

In query languages, question variables are distinguished from the rest in some way depending on the query language. Query languages may have imperative variants as well, i.e. they don't just answer questions but change things on the basis of the answer to a query.

Many query languages for RDF and OWL have been proposed, including RDQL, SERQL and SPARQL for RDF or OWL-QL, SPARQL-DL and SAIQL for OWL.

SPARQL

SPARQL is the best known RDF query language and a w3c recommendation. Its syntax resembles that of the well established relational database query language SQL, which may be one of its selling points. Compared to SQL, a major missing feature compared to SQL is UPDATE queries. SPARQL 1.1 Update is a W3C Working Draft 22 since October 2009. Once standard, SPARQL Update can do some of the jobs now done by pellet4TF and edit4tf tools.

Other SQL features missing from SPARQL are aggregator functions like COUNT, nested queries (query whose dataset is another query).

SPARQL query answering is based on the semantics of RDF. A query is answered by a set of subgraphs of the dataset that match (by graph isomorphism) the query graph. With this semantics, SPARQL only finds triples that are explicitly listed in the dataset. There is a provision in the SPARQL standard to extend SPARQL query answering to other entailment regimes beyond subgraph matching . Given such extensions, SPARQL can be used to extract entailments from a dataset.

SPARQL Update

SPARQL Update , an extension of SPARQL for updating rdf data has been long in the coming. The most recent version is W3C Working Draft 1.1. from Jan 5, 2012. SPARQL Update is supported by Jena. We plan to add it to the TF toolkit in the future. The Pellet4TF query engine provides a poor man's SPARQL transformer through the matrix and minus parameters.

The new working draft also contains an extension for federated queries with SPARQL. Once supported, this should provide another way to broadcast queries between TF sites.

According to the SPARQL standard , a sparql query only provides answers which simply entail the query. A query graph matches an data graph if the data graph entails the query. Blanks count as existentially bound variables. This makes writing sparql queries on data containing blanks a little tricky. In particular, one cannot change properties on an anonymous node using sparql query, because the result of the query will not share blanks with the data. This problem can be circumventedby temporarily converting blanks in the data to named resources using the rewrite utility for the duration of a query and then back to blanks afterward. The rewrite deblank/deblank options do just that. From a logical point of view, the deblank transform is an existential instantiation (skolemization) of a graph, and the reblank transform is its existential generalization.

The TF SPARQL query engine with pellet OWL reasoner is not complete with respect to OWL semantics. For instance, the pellet MIXED query engine answers yes to query ASK { exp:Bar rdf:type term:Designation } against the dataset below, so it is able to apply the definition of term:Designation. On the other hand, ASK { exp:Baz term:designationOf _:something } is answered no, because the reasoner does not do existential instantiation.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix ont: <http://tfs.cc/ont/> . @prefix term: <http://tfs.cc/term/> . @prefix meta: <http://tfs.cc/meta/> . @prefix exp: <http://tfs.cc/exp/> . term:Designation owl:equivalentClass [ rdf:type owl:Restriction ; owl:onProperty term:designationOf ; owl:someValuesFrom owl:Thing ] . exp:Bar term:designationOf exp:Foo . exp:Baz rdf:type term:Designation .

Realization in DL reasoning finds the most specific classes that an individual belongs to; i.e., realization computes the direct types for each individual. Realization can only be performed after classification since direct types are defined with respect to a class hierarchy. Using the classification hierarchy, it is also possible to get all the types for each individual.

Realization is typically a slow undertaking, since it in effect multiplies the number of instances with the number of classes. It usually does not make sense to ask TF DESCRIBE queries using the Mixed engine on larger ontologies, since the query can easily take more time than one cares to wait. One solution is to do realization on a large ontology once in a while and cache the results for direct consultation. Another way to go is to apply the TF Stacked engine.

Answer formatting in SPARQL is restricted. The answer of a SPARQL query can be yes/no (ASK queries), a list of bindings as text, xml, or json (SELECT), or a set of RDF triples constructed from the answer bindings and constant strings (CONSTRUCT). The result set can be filtered with functions that express syntactic conditions on resource names and literals. But there is no way to construct new resource names and literals from the answer bindings. This restricts the use of SPARQL as a RDF graph transformation tool (but see SPARQL-U or SPARUL). Such facilities are included in Semantic Web rule languages.

Ontology imports

Industrial strength ontologies, for instance in the biomedical domain or in the Linked Data initiative, can become quite big. We are talking of hundreds of thousands, millions, or billions of entities. At that scale, ontologies become unmanageable unless some kind of divide and conquer tactics is available. (Desktop ontology editors can have difficulties with tens of thousands of items, not to speak of us human users who get lost with hundreds). There is a growing literature on ontology sharing and importing, ontology decomposition, modularity etc. On the level of the OWL standard, there is as yet little support for all this. The only primitive in the standard is the owl:imports element that creates import relationships between ontology resources. In terms of RDF graphs, an ontology is just another named resource, a graph node. An URL is identified as an ontology in a statement like

<http://puls.cs.helsinki.fi/med> rdf:type owl:Ontology .

What IS an ontology, concretely? An ontology is identified by an ontology resource URI ("ontology header"), and it should probably consist of nodes (resources and literals) and statements (triples) describing them. But which triples belong to an ontology? The OWL standard does not explicitly say. The connection between an ontology URI and the associated resources and triples is not expressed in OWL. There are conventions based on physical document containment relations and URI sharing, but no explicit statement in OWL to the effect that a given triple belongs to a given ontology.

The association of an ontology "header" URI to an ontology document URL is physical and not standardized. The OWL recommendations talk loosely about "the ontology header", but there is nothing to enforce existence or uniqueness of such a header in an ontology document. There is no semantic or syntactic connection between an ontology document, the ontology header, and the triples in the document, any more than there is one between the name of a file and its contents. Membership of a statement in an ontology might be represented in RDF using reification, but it is not done in practice. There may be good reasons for not making membership of a triple in an ontology explicit. At least doing so with reification would be quite inefficient. Cf. Wikipedia .

The archetypal OWL RDF/XML ontology document format for simple ontologies was one where the ontology document URL matches the (only) ontology resource URI described in the ontology header, and the ontology URI is the URI prefix of all the resources "defined" in the ontology. But this is too simple to be true in real life.

In the default setup, resources and statements are "defined" in an ontology document and "belong to" that ontology. What that might mean semantically is nowhere defined. Triples in an ontology document are normally not marked in any way as being "from" one ontology rather than another. There is an RDFS utility property rdfs:isDefinedBy for pointing a resource as belonging to a vocabulary, but it is vaguely defined and rarely used. The same statement may occur in many ontology documents. Import statements create an imports graph of the ontology resources, and as a side effect pool together triples from the ontology document associated to (accessible by) the ontology URI. Do (all of) these triples thereby belong to the importing ontology? Hard to say. What triples "belong" in an ontology, and how the namespaces of the triples are related to that of the ontology URI, is not regulated.

Having the same URI prefix as an ontology URI proves nothing. An ontology's triples may (and typically do) contain vocabulary sharing URI prefix with the ontology URL, but that is again conventional. An ontology document may quite well contain items that don't have the same base URI as the ontology (they typically do). There is no constraint between an ontology URI and the URIs of other resources in "it" (i.e. its defining document). In particular, nothing can be inferred from such conventions in RDF or OWL.

In OWL 1, the external URL of an imported ontology and the ontology header URI were related only indirectly. The standard says "An OWL document consists of optional ontology headers (generally at most one)". Protege 3 used to require that an imported document's xml:base attribute matched the imported URI. This convention was predicated on the default setup explained above. See http://www.w3.org/TR/owl-ref/#imports-def . For OWL 2.0, see http://www.w3.org/TR/owl2-syntax/ .)This made it hard to locate imports. In OWL 2, imports are identified by URL or the document location, not by "internal" ontology URI. See http://protegewiki.stanford.edu/wiki/How_Owl_2.0_Imports_Work .

Since there is no necessary association between an ontology document URL and an ontology (resource) URI ("header"), many documents could contain and describe the same ontology resource (like any other resource), and one document could describe more than one ontology resource. It is just a convention related to Semantic Web addressing orthodoxy that an association between an ontology URI and an ontology document (accessible at that) URI exists and is one-to-one.

What happens if an ontology imports some URL, and that causes loading a document which does not contain an ontology header for the imported URI? Nothing specific happens in fact. If an imported ontology document has no (different) associated ontology resource description, importing it works like an include. If an ontology is too big to have in one file, it can be split into many in this way.

Ontology URLs are important for ontology imports. A statement of form

<http://puls.cs.helsinki.fi/med> owl:imports <http://puls.cs.helsinki.fi/> .

tells that the subject ontology imports the object ontology. Even if the purpose is just to split the ontology to manageable pieces, technically, the pieces become ontologies with their own URI. It may be safer to avoid using relative URI's in writing ontologies because they may cause trouble later if the ontology is split.

Resolving ontology URIs vs. URLs in ontology import statements can cause problems. An ontology loader goes and tries to locate a document describing the object ontology at the ontology URL. Failing that, it may ask for an association of the ontology URL with some file. Given a file, the loader looks for a RDF description about the ontology URL in it. If one is found, the file is loaded as a document describing the object ontology.

Jena resolves the object URI of an import statement to a physical URL against an external catalogue ont-policy.rdf . OWLAPI does not implement a catalogue mechanism, it is up to the user of the API to implement one. Protege 4 implements an XML catalogue mechanism. The current Pellet reasoner just fails if an imported URL does not point to a readable ontology. Further discussion:

http://protege.stanford.edu/doc/owl/owl-imports.html
http://www.nabble.com/Creating-local-repository-for-a-project-td24969842.html
http://protegewiki.stanford.edu/index.php/How_Owl_2.0_Imports_Work http://www.w3.org/2007/OWL/wiki/LC_Responses/TR1 http://www.w3.org/2007/OWL/wiki/Imports

In at least some versions of Protege the convention is that the xml:base attribute identifies "the" ontology defined in an ontology file. The idea behind this convention seems to be that the ontology resource uri should be the same as the xml:base prefix of the document containing it and that the shared prefix should also identify resources belonging to the ontology. A document imports the ontology that is mentioned in its xml:base element. The base attribute value must match the external URL of the ontology document, otherwise the ontology won't load. TF does not follow this convention. It prevents splitting an ontology into several files without changing the original namespaces. Things are supposed to change with Protege 4.1. In Protege 4.0 (build 113) import locations of Turtle (.ttl) files apparently are not cached, unlike imports of RDF/XML (.owl) files. Apparently in Protege 4, owl RDF/XML file locations are cached in per-directory XML catalogue files (catalogv00.xml), Turtle files are not.

The TF ontology has contended with the question of ontology imports from its beginning as the Tekes 4M project ontology. Already in that stage, term ontologies formed an inheritance hierarchy using the owl:imports primitive. In terms of the following figure, loading the company instance database on the top right corner would import the ontologies needed by it directly or indirectly. For example, a ship engine numbered #12345 could belong to a class xyz of engines described in a company ontology, which could define it as a given type of combustion engine ABC , describe in a more general ontology of diesels, and so on up.

Bridge ontologies

For importing third party OWL ontologies to TF, one method is using hand-made bridge ontologies. Bridge ontologies are (preferably relatively small) ontologies which interface between other ontologies. They import all or part of the component ontologies plus define the contact points where the imported entities "plug in" to one another, by adding properties or concepts to relate them.

For instance, an excerpt of the YSO ontology is embedded into TermFactory using the bridge ontology YSO_bridge.owl . It embeds the root concept(s) of the expert under the appropriate node of the TermFactory ontology, and places any implied concepts imported from third party ontologies by the excerpt under node YSO_bridge in the TermFactory bridge namespace. (Without a bridge ontologies, recursive imports may clutter the recipient ontology with irrelevant 'orphan' root concepts.)

Query Imports

Importing by query

Working experience with ontologies shows that large ontologies take a lot of space and time to process. The approach to ontology work taken in TF is "small models": instead of working with a complete ontology, make it easy to extract and merge working subsets from larger ontologies.

Define a facility for importing terms from one ontology to another, call it query importing. An ontology may specify the collection of terms to import from an external ontology through some query language expression, saying something like "import from our India subsidiary all concepts subordinate to our concept SurplusTax plus their English and Hindi designations and definitions".

This has advantages over manual importing. First, it saves manual work and errors. Second, since the query fetches the concepts by description, it also fetches the current version of the collection each time from the queried site. (It is of course possible to query a given revision of a collection from a site that is under some kind of revision control.) Third, the imports stay small and faster to compute with.

(version 0.9) ( deprecated) A TF specific meta:imports property might contain an inline OWL query language (SPARQL-DL) query specifying a part of the ontology to import.

The TermFactory solution to importing queries conforms to standard OWL syntax and Web addressing orthodoxy. The orthodoxy is to make web content associated to a given URI simply available at that location (see e.g. http://www.w3.org/Addressing/ ).

Ontologian tuonti kyselystä

Ontologioista tulee helposti isoja ja vaikeasti hallittavia. Termitehtaan pyrkimyksenä on hajoittaa ontologiat hallittaviin osiin, joita voi vapaasti yhdistellä.

Termitehtaan ratkaisussa ontologioita haetaan ja palastellaan ontologiakyselyillä. Sanastosta voidaan poimia kyselyillä haluttuja otoksia. Kyselyjä voidaan nimetä ja niiden tuloksia tuoda toisiin ontologioihin.

Esimerkki:

A TF query import URI points to a TF service which returns the result of a query as a model (possibly one already cached in the repository database or filesystem).

Here is an example of a query import:

The following query uri uses TF uri retry facility to fetch the ontology associated to an uri.

http://localhost:8080/TermFactory/query?uri=%3Chttp%3a%2f%2ftfs.cc%2f%ont%2fFinnish%3E

The following query uri also specifies repositories to serve as the dataset of the query. Items in a list of repositories to search are delimited by an URL encoded newline character %0A . More than one repository parameter can be given per query string.

http://localhost:8080/TermFactory/query?uri=http%3a%2f%2ftfs.cc%2f%ont%2fFinnish&repos=repo1.uri%0arepo2.uri%0a

Here is an example of using the result of one query in the dataset of another.

http://localhost:8080/TermFactory/query?pattern=Finn&repos=http%3a%2f%2ftfs.cc:8080%query?pattern=Country&repos=TFS.owl%0aPlace.owl%0a%0a%2fDomain%2fGeography%2fCountry%2f%0a

What this does is use the local query engine to query for items related to Finns in a dataset composed of (i) the result of a query to TF schema about countries and (ii) the local ontology index under countries. Note the double newline (percent encoded as %0a%0a ) marking the end of the embedded query's repo parameter.

If the http server at grapson.com is further configured to redirect its incoming uris (perhaps just those of them that satisfy some further condition) to the TermFactory webapp, the import element can be made maximally simple:

<owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://tfs.cc/ont0/English"/> </owl:Ontology>

This request is first rewritten by the grapson.com web server to a TermFactory query uri which is looked up or queried (from repositories specified in the query uri or in the service's settings). The resulting model is what gets imported.

DESCRIBE queries

For TF, the notion of a terminology entry is not built in on the level of data structure. Facts about a given term can be sought arbitrarily far in the graph surrounding ("about") the term. One way to give content to this relative notion of terminology entry is through a query language. The TF DESCRIBE query facility is a way in TF to implement various notions of terminology entry. The DESCRIBE query facility allows users to define a more or less fixed notion of what should be included in a terminology entry. Alternative views are possible, such as a concept-based view where an entry is identified by a concept and the terms related to that concept, or term-based view. where a term is selected as an anchor, or a lemma-based lexicographical view, where an expression serves as the key of the entry. Each view corresponds to its own DESCRIBE query template.

One level closer yet to the eye of a human user, there is the level of terminography or entry layout. Here, decisions are made about the order and grouping of the information returned by a given DESCRIBE query template. In part, such decisions can be already supported in the query language (the query language has some control over the order or grouping the query results). But it can be further refined as the results are laid out on a given user interface.

The SPARQL query language recommendation leaves the graph returned by a DESCRIBE query unspecified. The DESCRIBE form returns a single RDF graph containing RDF data about resources. This data is not prescribed by a SPARQL query, where the query client would need to know the structure of the RDF in the data source, but, instead, is determined by the SPARQL query processor.

The query pattern is used to create a result set. The DESCRIBE form takes each of the resources identified in a solution, together with any resources directly named by URI, and assembles a single RDF graph by taking a "description" from the target knowledge base. The description is determined by the query processor implementation and should provide a useful description of the resource, where "useful" is left to nature of the information in the data source.

If a data source, has no information about a resource, no RDF triples are added to the result graph but the query does not fail.

The working group adopted DESCRIBE without reaching consensus. The objection was that the expectations around DESCRIBE are very different from CONSTRUCT and SELECT, and hence it should be specified in a separate query language. If you have input to this aspect of the SPARQL that the working group has not yet considered, please send a comment to public-rdf-dawg-comments@w3.org.

See also proposals for a generic DESCRIBE query definition .

It may make sense for TF to define DESCRIBE query result specially so that the graph from a DESCRIBE query is something satisfying a traditional notion of "terminology entry". One way to proceed might be to take DESCRIBE as shorthand for a default query clause specified for each given ontology source, say in the common registry.

TermFactory DESCRIBE queries have been implemented using Jena and Pellet in TF subdirectory io in svn. The TermFactory query engine Query4TF extends the Pellet engine to DESCRIBE queries. It can be run from command line under name pellet4tf . Each DESCRIBE query defines some notion of a TF entry. The TF DESCRIBE query works as follows.

Termikyselyt

Toisin kuin perinteinen terminologia, Termitehdas ei perustu termitietueisiin (term entry). Ontologiassa tieto koostuu itsenäisistä käsitteitä, ilmauksia ja niiden suhteita kuvailevista väittämistä, joista voidaan koostaa monenlaisia kuvauksia. Perinteinen käsitelähtöinen termitietue on vain yksi vaihtoehtoinen kooste. Termitehdas koostaa erilaisia näkymiä termeistä termikyselyillä (DESCRIBE query). Esimerkki:

use a DESCRIBE query pattern to select the entities to describe (minimally, DESCRIBE r1 r2 ... ). The TF DESCRIBE query

  • iterates customisable CONSTRUCT queries on each selected entity (instance or class) to desired depth.
  • given reasoner, shows the class taxonomy to desired depth for each class in the first result set.

The TermFactory DESCRIBE query handler applies to each item in the result set of the DESCRIBE query some user-customisable CONSTRUCT query recursively to the depth given by option TF_DESCRIBE_DEPTH. In addition, for each class in the initial result set, it includes the ontology taxonomy (superclass and subclass tree) to the depths given by TF_DESCRIBE_UP and TF_DESCRIBE_DOWN. (The taxonomy options are currently disabled.)

The PELLET query engine applies the PELLET reasoner so the result set can contain triples which are not asserted but just entailed by the query dataset. For example, the TF schema ontology does not contain the statement that English is the referent of the Chinese term for English, but the query engine infers it from the converse statement (that the term has English as its referent). On the downside, the reasoning is slow. It is more efficient to convert the ontology to entry normal form offline and run a SPARQL query on the normalised ontology at runtime.

The default queries to be applied to the resources to describe are set in the pellet configuration file etc/tf.properties . Currently, pellet4tf allows specifying different DESCRIBE queries depending on whether one does a SPARQL query (ARQ without Pellet) or a PELLET query. The SPARQL query is defined by TF option TF_SPARQL_QUERY , by default etc/tfs.sparql , the PELLET query is defined by option TF_PELLET_QUERY , by default, etc/tfs.sparql .

The default query basically lists all the triples in the model that contain the resource/s to describe as subject or object. The filter clause FILTER ( ?invp != rdf:type ) is a SPARQL way of telling not to include type triples where the resource occurs as object (there would typically be too many of them to be of interest). The filter can be expanded with more conditions, or the whole query rephrased, to generate more focused descriptions.

The Pellet query engine used in TF solves complex queries (containing triples with variables in both subject and object position) bottom up, by building large intermediate results that make such queries slow on bigger ontologies. One type of query that is bound to hang on a big ontology are open queries on the type relation (find all type triples, i.e. ontology realization ).

The hardcoded query logic built into the TF DESCRIBE query execution does things that are hard to say in sparql:

  • building a class tree of a class in up and down direction to a given depth (currently disabled.)
  • recursive description of named resources to a given depth
  • blank node closure

The sparql DESCRIBE queries can then be kept simple, basically saying "list given properties for given subject". To get different results at different nodes, make the WHERE clause a union of alternatives,

Some RDF/OWL/TF specific recursive query logic used in executing TF DESCRIBE queries is collected into the Java class ./src/com/grapson/tf/que/jena/DescribeSelector.java . It is a Jena selector that decides which resources mentioned in a query result should be described further. DescribeSelector is one possible cause if a Stacked engine query result misses entailments. To change the constraints, there is nothing to do at present but modify and recompile DescribeSelector.

Asked to describe a concept, the TF DESCRIBE query engine recursively queries the graph neighborhood of all resources in the result set with a canned CONSTRUCT query and applies the same procedure to new nodes in the result set. The default recursion depth is 2 for named resources. Blank nodes are not subject to the depth limitation. The canned query can be set separately for SPARQL and Pellet query engines. With factory settings, the canned DESCRIBE query collects all properties of a resource one deep. The default behavior can be modified in many ways:

  • start from the concept instead of the class (the description will include the class)
  • increase recursion depth
  • complicate the sparql query
  • recompile DescribeSelector

The pellet4tf command line application can do command line DESCRIBE queries in the following format:

pellet4tf query -D ont1:China -d ../owl/TFS.owl @prefix : <http://tfs.cc/owl/TFS.owl#> . @prefix meta1: <http://tfs.cc/meta1/> . @prefix meta0: <http://tfs.cc/meta0/> . @prefix exp0: <http://tfs.cc/exp0/> . @prefix term1: <http://tfs.cc/term1/> . @prefix term0: <http://tfs.cc/term0/> . @prefix ont1: <http://tfs.cc/ont1/> . @prefix ont0: <http://tfs.cc/ont0/> . @prefix meta: <http://tfs.cc/meta/> . @prefix exp1: <http://tfs.cc/exp1/> . @prefix exp: <http://tfs.cc/exp/> . @prefix ont0: <http://tfs.cc/ont0/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl2xml: <http://www.w3.org/2006/12/owl2-xml#> . @prefix sem0: <http://tfs.cc/sem0/> . @prefix term: <http://tfs.cc/term/> . @prefix ont: <http://tfs.cc/ont/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix sem: <http://tfs.cc/sem/> . exp1:en-China-N exp:baseForm "China"^^xsd:string . exp1:en-China-N exp:catCode "N"^^xsd:string . exp1:en-China-N term:designationOf term1:en-China-N_-_ont-China . exp1:en-China-N exp:langCode "en"^^xsd:string . ont1:China a meta:Object . ont1:China a ont:Concept . ont1:China a ont:Country . ont1:China a owl:Thing . ont1:China a sem:Meaning . ont1:China a sem:Place . ont1:China a sem:Role . ont1:China term:referentOf term1:en-China-N_-_ont-China . term1:en-China-N_-_ont-China term:hasDesignation exp1:en-China-N .

The resource to describe is given as the value of option -D. The resource can be given as a URI in angle brackets as shown below (the argument may have to be quoted to prevent the shell parsing the angle brackets as shellfile redirections):

pellet4tf query -D '<http://tfs.cc/exp/English>'

Or it can be given as prefixed name (aka qualified name) as above. In the latter case, the prefix must be declared in etc/prefix.sparql .

The query engine used in the selection stage and the engine used in the description stage of a DESCRIBE query should sometimes be different. To find all that is known about some topic, the best bet is to use a reasoner all the way. For editing purposes, it may make good sense to use a reasoner to select relevant items for editing to be sure nothing is missed, but then describe those resources for editing with a SPARQL reasoner. It only makes sense to edit asserted triples, because implied triples will not go away unless one edits the axioms that imply them. Editing is a syntactic operation which should be applied to a minimal nonredundant representation.

To tell the TF query facility to switch engine midway of a DESCRIBE query, the query engine parameter can be of form ENGINE1+ENGINE2. For the editing scenario above, the parameter value should be MIXED+SPARQL. Parameter value MIXED (or MIXED+MIXED) uses the MIXED engine in both phases. The second engine in the composition is just ignored if the query is not of DESCRIBE query. The Stacked engine has its engine choices built in, so it cannot be composed with the other engines in this way.

Naming resources in TF

A central source of complexity concerns naming of resources in TF. As explained in the section on TF philosophy, the local/global naming problem is very much at the core of TermFactory. The global web resource identifiers (URLs, URIs, IRIs, URNs) are unambiguous all right, but they are also long, ugly, and hard to remember.

Semantic Web orthodoxy only requires that global resource identifiers are monosemous, viz. identify a unique resource. There is no converse requirement that entities associated to resource IDs should be mononymous, viz. have just one URI associated to them. There would be no way to enforce it. For TF, this means that though a given TF URI should identify just one terminology entity, there is nothing to prevent one terminology entity to have many URIs aliased to it. This is a useful insight, because it allows a variety of different URIs associated to a given term or other entity depending on need. For some purposes (decentralised creation, search, or debugging), a URI descriptive of the thing named is useful. For other purposes (say, obfuscation, channel independence) an encrypted or character-encoded version of a descriptive URI, or just a simple anonymous numeric URI (version maintenance), may be preferable. The TF rewrite utility supports some forms of such aliasing.

Descriptive identifiers

The main reason to worry about the choice of URIs for expressions and terms is what is sometimes termed the ontology hell . That is the situation (already present) where effectively the same concept is reinvented many times with only slightly varying URI and essentially the same meaning by many authors, who then face the problem of ontology matching to find if they are really talking about the same thing or not. The problem of polysemy besetting natural language is just replaced by an equally confusing problem of synonymy, known in terminology theory as the harmonisation problem between different terminology standards. This problem can be alleviated (not removed) by a naming convention that exposes the similarity of competing conceptualisations.

The TF descriptive naming convention for expressions and terms is a compromise between a human readable name and a perfect hash. The idea is to choose key properties of an expression/term and form the descriptive name from them. The properties are supposed to be real keys in the relational database or OWL 2 hasKey sense, so that they uniquely identify the expression/term. Two different items should not end up with the same descriptive name, and one item shoud get only one such name. The main brunt of identification is borne by the site URL prefix. The local name of an expression is formed from the expressions language code, base form, and category (usually, part of speech code is sufficient, but it can be another sufficiently distinguishing tag if not). Different authors should be able to arrive at the descriptive name of an item independently and be reasonably confident that another author adopting the same or similar descriptive name is after the same or a closely related notion. What we want to avoid as far as possible is the need of a separate global catalogue of resources.

The three parts of the local name are separated by hyphens. For example, the English language greeting 'hi' in TFS (if it belonged to the TFS vocabulary) would get the descriptive URI

http://tfs.cc/exp1/en-hi-S

The descriptive label for a term is formed by concatenating the descriptive label of its designation with a namespace prefix for its referent and the referent's local name, separated by hyphen. Since namespace prefixes are not globally registered, this naming convention is only suggestive of the referent of the term, and its main purpose is to serve as a mnemonic distinguisher. The designation and referent parts of the term label are separated by the string "_-_". For example, if TFS had a concept for greetings whose prefix name were sem:Greeting , the sense of English "hi" as a greeting could be labeled descriptively as

http://tfs.cc/exp1/en-hi-S_-_sem-Greeting

Hyphen and underscore are used as separators in the labels because they are the least reserved separator like characters in the many Semantic Web character sets. These separator strings should not occur inside the parts they separatein a way that could cause ambiguity.

The validity of the assumption that the key properties of a descriptive identifier are really keys can be stated and tested in OWL 2 using the OWL 2 construct HasKey. It allows defining keys for a given class. An HasKey axiom states that each named instance of a class is uniquely identified by a (data or object) property or a set of properties - that is, if two named instances of the class coincide on values for each of key properties, then these two individuals are the same.

There is no equally catholic naming convention for concepts in TF. However, when the rewrite relabeler finds a concept without a name associated to a term that has an expression designating it, the relabeler tries to use that expression's base form to generate a camel cased name for the concept. For instance, a blank concept designated by an expression with base form 'concept without name' will get a URI with local name ConceptWithoutName . This convention can be useful when converting third party word lists to ontologies.

TFS has not got a descriptive labeling convention for URIs of multilingual messages or their texts. The justification is that texts have a weaker identity. They are less generally reused, have fewer unpredictable properties, and are less likely to get accidentally reinvented by different authors, so there is less motivation for them to carry their identities on their sleeves.

An important advantage of using descriptive URIs for TF expressions and terms, i.e. ones that code the key identifying properties of the resource in the resource name, is that such URIs are least likely to cause ambiguity in situations where the creation of new URIs is not centrally controlled. Namespaces avoid name collisions across sites, but inside a given namespace, contention between URIs is at least more easily detected and remedied when the name is descriptive of the resource. A corresponding weakness of this convention is that terms which differ only insignificantly (say, by spelling variant) may get created alongside one another and need to be mapped as equal or related after the fact. This danger could be minimised by additional conventions on the choice of representative name. On the other hand, even with a slightly leaky convention, the similarity of close variants is less likely to go undetected.

Besides genuine aliasing where different URIs are used to point to the same TF entity, resource identifirs may need to get character encoded to conform to different standards. Non-Latin script support in today's Semantic Web tools is surprisingly weak and the character conventions in different SW standards and tools far from uniform ( Auer et al. 2010 ). In OWL 2.0, ontologies and their elements are identified using Internationalized Resource Identifiers (IRIs) [RFC3987]; while OWL 1.x uses Uniform Resource Identifiers (URIs). For some purposes, it is safer to use URI-encoded (percent encoded) versions of non-ascii URIs.

Absolute TermFactory URIs like http://tfs.cc/ont/Concept or http://localhost:8080/TermFactory/query?uri=http%3a%2f%2ftfs.cc%2f%ont%2fEnglish can contain a scheme (protocol) http:// , authority (host:port) localhost:8080 , path /concept , and query ?query=... . A fragment identifier after hash sign like #Concept at the end of a URI is not officially part of the URI, but just a URI reference. Such references are often used for ontology concept identifiers (though not in TF). A relative URI is a suffix of an absolute URI with scheme left out.

TermFactory URIs

This section proposes naming conventions for TermFactory resource URIs.

Hash vs.slash vocabularies

A common convention for naming resource URIS is to append the local name of a resource as a fragment identifier to the name of the ontology, separated by the cross-hatch or hash character #. This convention suggests that an ontology URI like http://tfs.cc/ont0/ctryCode should point to a location in an ontology document at URI http://tfs.cc/ont0 . In actual fact, this is not how it works most of the time.

As things go, an ontology resource URI like http://host/path#Resource may not resolve to any document in the websphere. If there is an ontology document at http://host/path at all, unless the document is HTML, the fragment identifer after the hash (#) will not single out any particular part of it.

Besides, it in general makes little sense to think of an ontology resource as a fragment of any one particular ontology document, since TF resources can be described at many locations in one document and in many different documents. A technical difficulty of the hash vocabulary convention is also that fragment identifiers are not part of the http url that gets sent between servers. A client can only ask for and receive a complete document from a server. Fragment identifiers are only meaningful for clients. (For discussion, see RDF best practices , WordNet URIs , Jeni Tennison's blog , and Hebeler et al. 2009:58 ).

A better design decision for TF ontologies is to use resolvable URLs (known as slash vocabulary) for ontology resources to begin with. TermFactory's own vocabulary is a slash vocabulary. However fragment identifiers (hash vocabularies) are in general use, so TF had better have ways to handle them too.

URI abbreviations

There are a variety of ways of abbreviating URIs in ontology documents. For XML, URIs must be abbreviated in element and attribute names with namespace prefixes. An URI like http://tfs.cc/ont/Concept must be abbreviated with a prefix to something like ont:Concept in order to pass for an XML element name. (Such prefixed names are called qualified names, or QNames, in XML jargon.) Such prefixes can be invented ad hoc, and they remain ad hoc in the sense that there is no authority to maintain more than a handful of such prefixes globally. The prefixes currently in force can be declared at the root RDF element of an ontology RDF/XML file. Rewriting ontologies with ontology editors can unexpectedly change familiar prefixes.

Another abbreviatory device that can be used where prefixes are not allowed, (XML attribute value strings is one such place), are XML entity references like &tfs; for the URI prefix http://tfs.cc/ . Such entity references start with an ampersand and end with a colon. They are declared at the top of an XML file in a DOCTYPE element before root.

Beside these abbreviations, there are a few (perhaps somewhat riskier) tricks. An XML root attribute xmlns="http://tfs.cc/ont/" defines a so called empty namespace prefix, which is written in whenever an XML element appears without a prefix. Another abbreviatory attribute is xml:base which can be set to an URI prefix and is used to resolve (fill out) relative URIs, for instance orphan URI fragments #Concept in attribute values. Dealing with relative URIs is convenient, but not without risks. In general, it is safer to use abbreviatory devices that leave at least some local trace of what was left out.

TF3 encoding

The TF rewrite utility defines a least common denominator encoding specific to TF called TF3 encoding. TF3 encoding only uses characters in the intersection of XML QName, URI and Turtle name non-reserved character sets, so it should survive both XML or Turtle QNames and URIs. On the minus side, a TF3 encoded string may be four to eight times longer than the original. Also it is more ambiguous than Turtle Unicode escapes because all the characters are ascii alphanumerics.

URL encoding aka percent encoding has a percent sign in front of a two-digit hex number. A character with more than two significant hex digits is first converted to its byte sequence in UTF-8. (An online converter can be found here .) Turtle (like Java) codes Unicode characters with prefix \u followed by the codepoint in four hex digits (using leading zeroes for shorter numbers). TF3 encoding consists of URL encoding with the percent sign replaced by the string u00 . Equivalently, it is Turtle encoding of UTF-8, without the beginning backslash. The XML reserved characters and the dot character which is reserved in Turtle qualified names are always TF3 encoded (dot as u002e). Hyphen and undescore used as expression and term separators in TF descriptive URIs are included in the TF3 non-reserved character set. For no good reason, Turtle shuns dots while SPARQL allows '.'s in names in all positions apart from the first or last. Protege Turtle reader accepts dot in resource names, but not comma. It would be as well for Turtle not to reserve its punctuation characters, since they are conventionally separated by whitespace anyway.

TF namespaces

Besides ontologies as manually prefabricated named subsets, TF uses the query engine to create and retrieve subsets by description. Query by description has slower response time than a prefabricated set, but the results can be fresher.

The activity of assigning properties to a vocabulary is known as metamodeling or punning in OWL jargon. The trick with punning is to do it so as to keep OWL DL within standard first order logic. In order to conform to OWL 1.0 DL, the TF schema duplicates class and property names with almost but not quite synonymous individual representative instances, in order to be able to assign properties (such as terms) to classes through these representatives. The current TF namespace convention is that if a repository's ontology namespace is foo , classes and properties are in that namespace. The class representative namespace is foo0 and the individual namespace is foo1 . The relation between the class ontology and the matching instance ontology in TF a systematic was originally made into a paronymy rather than full homonymy because OWL 1 DL does not allow punning. In OWL 2.0, class/instance homonymy (called in ontology jargon punning or metamodeling) is allowed, so the class and instance name can be the same.

The TF paronymy convention between classes and their puns is retained in TF for backward compatibility, but using it is optional with OWL 2. Even OWL 2.0 does not allow properties homonymous with classes/instances, so that the TF namespace paronymy is still needed to metamodel properties.

On the downside, the only way allowed in OWL to link a class or property to its pun is with an annotation property ( meta:hasPun ). The OWL standard excludes annotation properties from DL reasoning. Inferences between properties and their puns must therefore happen under RDF semantics outside official OWL reasoning. (As a matter of fact, many RDF and OWL reasoners seem to support querying annotations.) See Motik 2007 .

The current TFS entity and namespace declarations in TFS.owl and TFTop.owl are as follows. The entity names and namespace prefixes are optional in the sense that nothing is hardcoded on them at present. The entity abbreviations are used in XML attribute value strings and the prefixes in element and attribute names.

Show/hide TF schema namespaces
legend
<!DOCTYPE rdf:RDF [ The entity declarations go in the DOCTYPE element.
<!ENTITY isocat "http://isocat.org#" > ISO data categories (in TF)
<!ENTITY tfs "http://tfs.cc/" > TF home
<!ENTITY meta "http://tfs.cc/meta/" > TF administrative classes and properties
<!ENTITY meta1 "http://tfs.cc/meta1/" > metalanguage instances
<!ENTITY term "http://tfs.cc/term/" > term classes and properties
<!ENTITY term0 "http://tfs.cc/term0/" > term class and property representatives (puns)
<!ENTITY term1 "http://tfs.cc/term/" > term instances
<!ENTITY ont "http://tfs.cc/ont/" > concept classes and properties
<!ENTITY ont0 "http://tfs.cc/ont0/" > concept class and property representatives (puns)
<!ENTITY ont1 "http://tfs.cc/ont1/" > individual concepts
<!ENTITY exp "http://tfs.cc/exp/" > expression classes and properties
<!ENTITY exp1 "http://tfs.cc/exp1/" > expression instances
<!ENTITY exp0 "http://tfs.cc/exp/" > expression class and property instances (puns)
<!ENTITY sign "http://tfs.cc/sign/" > sign classes and properties
<!ENTITY sign0 "http://tfs.cc/sign0/" > sign class and property instances (puns)
<!ENTITY syn "http://tfs.cc/syn/" > form classes and properties
<!ENTITY syn0 "http://tfs.cc/syn0/" > form class and property instances (puns)
<!ENTITY sem "http://tfs.cc/sem/" > meaning classes and properties
<!ENTITY sem0 "http://tfs.cc/sem0/" > meaning class and property instances (puns)
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" > xml schema datatypes
<!ENTITY owl "http://www.w3.org/2002/07/owl#" > owl namespace
<!ENTITY owl2xml "http://www.w3.org/2006/12/owl2-xml#" > owl xml namespace (not TF)
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" > rdf schema namespace (not TF)
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" > rdf namespace (not TF)
]>
<rdf:RDF The XML namespace prefix declarations are in the document root RDF element
xmlns="&tfs;owl/TFS.owl#" default namespace (not used)
xml:base="&tfs;/owl/TFS.owl" the xml:base element (only used for the ontology element below)
xmlns:meta1="&tfs;meta1/" the rest goes as above.
xmlns:exp0="&tfs;exp/"
xmlns:term1="&tfs;;term1/"
xmlns:tfs="http://tfs.cc/"
xmlns:term0="&tfs;term0/"
xmlns:ont1="&tfs;ont1/"
xmlns:ont0="&tfs;ont0/"
xmlns:meta="&tfs;meta/"
xmlns:exp1="&tfs;exp1/"
xmlns:exp="&tfs;exp/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl2xml="http://www.w3.org/2006/12/owl2-xml#"
xmlns:term="&tfs;term/"
xmlns:sign="&tfs;sign/"
xmlns:sign0="&tfs;sign0/"
xmlns:syn="&tfs;syn/"
xmlns:syn0="&tfs;syn0/"
xmlns:sem="&tfs;sem/"
xmlns:sem0="&tfs;sem0/"
xmlns:ont="&tfs;ont/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:isocat="http://isocat.org#">
<owl:Ontology rdf:about=""> The TFS ontology element. The rdf:about attribute value is resolved (filled in) from xml:base
<owl:versionInfo>TF-Schema version 2.0 15.12.2010</owl:versionInfo> current version in an owl:versionInfo element.
</owl:Ontology>

The xml:base attribute is used by ontology editors as a shortcut to find which ontology a document contains (to avoid having to parse and load the ontology and look for ontology resource descriptions inside), so it should be kept as is.

New names for items of each type in the same ontology become new URI fragments, the rest of the URI is kept unchanged. New ontology providers or TF sites have different scheme and path (the part identified by entity tfs above). The most fixed part of the naming convention are the part of the URI path just before the fragment which indicates the type of item named by the uri. Each site can choose URI prefixes at will. There is no way of enforcing conventions there as nice as it might be.

Namespace expression holds forms, namespace concept meanings, and namespace term signs. Namespace sem is for TF general language semantics. Items in this namespace purport to support natural language processing such as automatic generation of definitions. The TF general language semantics is described in section on the NLP interface . Special language concepts that have instances at leaves are under ont:Concept .

Namespace prefixes are not fixed by the XML namespace standard, so each repository may choose its own prefixes. The repository's favorite prefixes can be made public in the repository's etc/ont-policy.rdf file and in home ontology documents.

The TF punning name convention is as follows. Assume ns:Foo stands for a general concept as an owl:Class. ns0:Foo names its first order pun, or metamodeling representative, from which to hang properties of the class. ns1:Foo stands for a bona fide individual member of a countable class. Singular or plural count entities, like individual countries, are in ns1 namespace. Abstract noncount entities, like names of concepts, languages or subject fields, are in ns0 namespace. Roughly, ns1 houses (more) concrete things and ns0 (more) abstract things. This differential treatment goes with the fact that countries form a partOf hierarchy (USA is a part of America), while languages or domains form a subclassOf hierarchy (American English is a subclass of English). Admittedly, rather arbitrary, but there it is. Although a punning name convention is no longer required in OWL 2.0, it is provisionally retained for backward compatibility. Third party TF terminologies are not required to follow it.

TF property names follow the naming convention that object properties (those that take other resources as values) are named in converse pairs hasPropertyX and propertyXOf . Converse relationships must be declared explicitly (the naming convention does not do the job, it is just mnemonic). What is more, converses do not come about by the declaration only. Both directions must be asserted in the ontology, or a reasoner must be used to close the ontology under converses. For this reason, a TF terminology may not look much when viewed in an editor like Protege in the raw. To make the links implied by the schema visible, the ontology must first be classified by some reasoner. (Protege has Fact++ built in for this.)

The class meta:Pun collects together and provides a taxonomy of the first-order representative instances of higher-order classes in TF. One subclass of meta:Pun is the class ont:Concept containing (the puns of) special language concepts. Analogous relationships hold between red objects like Rudolph's nose, the colour adjective sem:Red, its pun sem0:Red, and the class sem:Color. This way we can state that Rudolph's nose is red and red is a color without implying that the nose is a color, while staying in first order logic.

Resources vs. literals

Any graph can have internal and leaf nodes. In OWL, a type distinction is made between resources and literals. Resources can have properties including identifiers (like URIs) associated to them. Literals have no properties, so they are always leaves. A resource's URI is (at least should be) the place where that resource is at home. The URI, and information associated to it, is (or should be) enough to identify the resource and distinguish it from other resources. A literal has no home base, its meaning depends on where it occurs. From the point of view of unambiguity, an URI might seem a perfect candidate for a standardised property picklist value. In practice, however, people prefer short natural language like identifiers. Property value picklists are usually literals formed of suggestive (English) keywords or abbreviations, such as the ISO standard language and country codes.

Given a property URI as context, the difference between a code and a URI as value from the point of view of unambiguity is technical. For example codes such as the ISO standard language or country codes can be literal picklist values of datatype properties. They get identified in TF uniquely as pairs of property URI and associated value. A property URI and value related as a URL and an URL fragment are is just as good an identifier for the value as a separate URI for the value alone.

One difference between literal picklists and picklist resources is how they are documented. URIs are be documented by their own TF entry. Picklist values are documented in a more roundabout way .

Another difference is that literal picklist values are not localized by TF. Only resources identified by a URI get localised. Literals are literally literals, they are what they are. If a picklist property needs localizing, make the property an object property and its values TF resources.

In coding content to OWL one must decide what content to represent explicitly in OWL and what to leave implicit, for people or external processors to interpret. The decision depends on whether one plans to do OWL reasoning on the content. For instance, to relate the part of speech codes N and V to their join N|V in OWL, the join might be defined explicitly as an OWL oneOf list. But if defined N|V in OWL as sameAs OneOf the two atoms, the join is no longer a literal but an enumerated class . An alternative is to just add N|V as an additional atomic part of speech code . Then OWL knows nothing about the relation of the new code to its parts. That reasoning must be done elsewhere, e.g. in querying.

The OWL type distinction between resources and literals also divides up TF properties into object and datatype properties. (It is not possible to define an OWL property that subsumes both types of properties.) TF follows the convention of naming object properties with names of form hasSomeProperty and its inverse SomePropertyOf , while datatype properties have lowercase initial names not starting with has or ending with Of , like someAttribute . The convention in TF is that hasSomeProperty is a direct property and somePropertyOf is inverse, so that the one having the owl:inverseOf property is the inverse member of the pair.

Formatted text and the XMLLiteral datatype

TF string literals such as texts and baseforms can have datatype rdf:PlainLiteral by default. To indicate formatted text specially, property exp:text has a subproperty exp:textXML with object datatype rdf:XMLLiteral . This datatype can contain any well-formed XML, so exp:textXML can include formatting markup, mathematical formulas in MathML, even inline images in SVG (should one want them). Analogously, the exp:baseForm property of an expression has a subproperty exp:baseFormXML for thoses cases where formatting is really part of the baseform, for instance subscripts, trademarks or other markup. Comments too can contain formatting. The following is an example of Japanese written in Ruby (Rubi) characters in HTML annotation.

<owl:Class rdf:about="&tfs;exp/Text"> <rdfs:subClassOf rdf:resource="&tfs;exp/Form"/> <rdfs:comment xml:lang="en" >Text reifies an XML parsable text string in some language (whatever the string means)</rdfs:comment> <rdfs:comment rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"><span xmlns="http://www.w3.org/1999/xhtml" class="value name" datatype="rdf:XMLLiteral" title="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">This is a <strong>formatted</strong> text string in some language (whatever the string <em>means</em>)</span> </rdfs:comment> </owl:Class>

The TF XHTML writer is set up to parse an XMLLiteral in the XHTML namespace (an XML element having the namespace attribute xmlns="http://www.w3.org/1999/xhtml" as in the example above) into HTML, so it will display right in a browser. Conversely, the XHTML parser will write such an element back into an XMLLiteral string as above. For example, the formatted XMLLiteral string above gets converted to this by the TF XHTML writer.

  • rdfs:comment
    • This is a formatted text string in some language (whatever the string means )

rdf:XMLLiteral datatype was at risk in OWL 2, see http://www.w3.org/2007/OWL/wiki/At_Risk#rdf:XMLLiteral_.28Ongoing.29), but got included, see http://www.w3.org/2007/OWL/wiki/Quick_Reference_Guide.

Redirection in TermFactory

The Semantic Web addressing orthodoxy is to serve a document from its URL. But documents at given URLs may not be always easily accessible to all, and certainly not editable by all. Often one needs to hold copies of a document obtained from a given URL. So we need a way to keep a mapping from the URL to the copy we want to access at a given time.

A whole host of solutions have been invented in the websphere for web address resolution, forwarding or URL redirection . Jena invented location mappings for remapping RDF documents and ontology policy files for remapping OWL ontologies. ( Protege 3.1 used ont-policy files. Protege 3.2 had a home-grown repository mechanism.) XML provides XML catalogs. ( Protege 4 uses them.) In addition, client or server side scripts, web servers and application containers have URL mapping or rewriting facilities. Finally, domain names can get forwarded.

This chapter covers different ways of redirecting resource identifiers (URIs) and web addresses (URLs) in TF. There are many, for different needs. One is specific for TF, others are borrowed.

  • TF location mappings and the retry utility
  • Jena RDF location mappings and OWL ont-policy documents
  • webpage forwarding with apache php
  • webpage forwarding with Tomcat servlets
  • Tomcat url rewriting
  • Apache url rewriting

TF location mappings

The TF retry location mapper is a general finite state string rewriting mechanism. It can rewrite any string at all into another string, not just a web identifier or web address of a resource. It serves to localize and globalize (web) addresses much like the rest of TF serves to localize and globalize (resource) names.

The TermFactory back end uses the retry facility and jena style location mappings to redirect ontology urls. A given uri, e.g. http:tfs.cc/ont/Finnish can be retrieved with a query string of form

http://localhost:8080/TermFactory/query?url=http%3a%2f%2ftfs.cc%2fowl%2fTFS.owl

The url to be sought is given here as value of parameter named url in URL encoded form. What the TF query service does with a request of this form is try locating the url using TF location mappings repeatedly until a hit is found. The mappings are given in a jena rdf format location mapping file whose name is specified in the TF option settings file tf.properties , by default, it is called file:etc/location-mapping.n3 . (The file, like the rest of TF settings, are kept in the TF web service directory.)

TF location mappings look as follows:
## -- TF mappings # localhost overrides tfs.cc [] tf:mapping [ tf:prefix "http://tfs.cc" ; tf:altPrefix "http://localhost/tfs.cc" ] . # try dav copy first [] tf:mapping [ tf:pattern "http://localhost/(([^d]|d[^a]|da[^v]).*)" ; tf:altPattern "+http://localhost/dav/localhost/$1" ] . [] tf:mapping [ tf:prefix "file:///" ; tf:altPrefix "+http://localhost/dav/" ] . # then original [] tf:mapping [ tf:prefix "+http://localhost/dav/localhost/" ; tf:altPrefix "http://localhost/" ; ] . [] tf:mapping [ tf:prefix "+http://localhost/dav/home/" ; tf:altPrefix "file:///home/" ; ] . ## Wordnet php entry [] tf:mapping [ tf:prefix "Wn30:" ; tf:altPrefix "http://localhost/tfs.cc/wn30/wn30entry.php?e=" ] . ## scheme conventions [] tf:mapping [ tf:prefix "home:" ; tf:altPrefix "http://localhost/" ] . # home: is mapped to home server document root [] tf:mapping [ tf:prefix "site:" ; tf:altPrefix "http://localhost:8080/TermFactory/" ] . # site: is mapped to home server TF webapp root [] tf:mapping [ tf:prefix "dav:" ; tf:altPrefix "http://localhost/dav/" ] . # dav: is mapped to TF home server dav root [] tf:mapping [ tf:prefix "dav+" ; tf:altPrefix "http://localhost/dav/" ] . # dav+ is mapped to TF home server dav root ## prefix mappings [] tf:mapping [ tf:prefix "ont0:" ; tf:altPrefix "http://tfs.cc/ont0/" ] . [] tf:mapping [ tf:prefix "exp0:" ; tf:altPrefix "http://tfs.cc/exp0/" ] . # nicknames for templates #[] tf:mapping [ tf:name "sem" ; tf:altName "site:etc/templates/sem.ttl" ] . #[] tf:mapping [ tf:name "syn" ; tf:altName "site:etc/templates/syn.ttl" ] . [] tf:mapping [ tf:name "ont" ; tf:altName "file:etc/templates/ont.ttl" ] . [] tf:mapping [ tf:name "exp" ; tf:altName "file:etc/templates/exp.ttl" ] . [] tf:mapping [ tf:name "sem" ; tf:altName "file:etc/templates/sem.ttl" ] . [] tf:mapping [ tf:name "syn" ; tf:altName "file:etc/templates/syn.ttl" ] .

TF location mappings are syntactically just like jena location mappings except for the tf namespace. They can be kept in the same file as jena location mappings. If there are jena location mappings in the same file, the jena mappings apply first. They work the usual jena way, either succeed or fail for good.

TermFactory mapping files can also be location mapping list files. A location mapping list file has extension .txt . Each line in a location mapping list file is a name of a location mapping file or another location mapping list file. There is an example of one in etc/wn/wnlmap.txt .

TF location mappings are used by the TF retry facility. The first mapping above is a name mapping that maps a complete URI complete URI to another. The other examples above are prefix mappings. Prefix mappings apply to urls whose prefix matches the tf:prefix and they replace this prefix with the tf:altPrefix .

TF built in pseudo schemes

The first mapping adds a built in pseudo scheme prefix rdb+ in front of a http://tfs.cc/ URI. The result is not a well-formed URI but it tells the TF location mapper to look for the original URI in the default database cache. The second mapping rewrites a database cache uri beginning with the given prefix as a TF query string and turns the suffix of the original URL into the uri parameter of the result URL (url encoding the suffix if needed). The rdb+ and tdb+ pseudo schemes are hardcoded in the retry utility and always resolve to default databases given in TF properties.

As the second last line in the above example shows, location mappings can also be used to abbreviate namespaces. A location mapping facility is just a string prefix map, so any string prefix can be mapped to any other string prefix using it. The example maps the namespace prefix ont: to the URI slash namespace prefix http://tfs.cc/ont0/ .

The last line in the above example is a regular expression pattern rule. If the pattern matches the uri (as a whole), it is replaced by the alternative pattern. If more than one pattern fits, the pattern with the most letters and digits in it wins out. Note that pattern rules can do anything prefix rules can (and much more). The translation is straightforward:

## prefix rule [] tf:mapping [ tf:prefix "ont0:" ; tf:altPrefix "http://tfs.cc/ont0/" ] . ## pattern rule [] tf:mapping [ tf:pattern "ont0:(.*)" ; tf:altPattern "http://tfs.cc/ont0/$1" ] .

On the other hand, prefix rules are easier to write, their interactions are more predictable, and one does not have to worry about regular expression syntax.

The TF location mapper follows the jena location mapping algorithm (map before lookup, longest match first) to prefer the longest matching rule. If there are alternative rewrites for the same url, it only tries the best match. The null prefix match is always the last choice. No url is mapped twice. If a mapping fails, retry tries to map the failed mapping. If there are alternative rules for the same prefix (should not happen), the last one wins. There is no backtracking, but mappings can define a deterministic decision tree.

The TermFactory back end retries TF location mappings until it succeeds, runs out of mappings, or exceeds the number of rewrites allowed. (There is a compile time limit to how many "hops" each TF query service will allow; currently the limit is three.)

Retry returns the first location that connects. If a location fails to connect at a given time (say temporarily), it gets remapped to the next one in line. Therefore it is fragile to use retry to shadow different versions of the same document. Depending on connections, different versions may get returned. This may be good enough for some tasks, but not for editing. We do not want to accidentally revert to an older version, edit that and overwrite the latest one that just happened to be temporarily unavailable.

Note that retry is not in general reflexive, since the first hop can shadow an available location. It is not transitive either, because it returns the first available mapped location. Is it possible that retry for a returns b, but retry for b returns c. After finding b for a, one should take care to fetch b without another retry, if b is what is wanted. Here is an informal condensed definition of the retry relation.

  • a retry c == if a lmap b then b retry2 c else a = c
  • a retry2 c == if a connects then a = c else if a lmap b then b retry2 c
  • pseudo+a connects if a connects
User defined pseudo schemes

As a TF specific invention, pseudo scheme prefixes of form [^+:< ]*[+] can be invented freely and used for coding mapping preferences. A user defined pseudo scheme prefix is visible to location mappings but is removed at file connect. For instance, the empty pseudo scheme + in retry +URL tries to connect to URL. Here it has the same effect as mapping [] tf:mapping [ tf:prefix "+" ; tf:altPrefix "" ] . retry +URL has the same effect as retry notry URL if URL is accessible, but unlike the latter, gets mapped further if it is not. The mappings below tell retry to first look for a URI in the DB, then in local filesystem, and lastly as is.

[] tf:mapping [ tf:prefix "" ; tf:altPrefix "+rdb+" ] . [] tf:mapping [ tf:prefix "rdb+" ; tf:altPrefix "+rdb+" ] . [] tf:mapping [ tf:prefix "+rdb+http" ; tf:altPrefix "http+file" ] . [] tf:mapping [ tf:prefix "+rdb+" ; tf:altPrefix "" ] . [] tf:mapping [ tf:prefix "http+file" ; tf:altPrefix "http" ] .

The first two rules map all URIs to +rdb+ pseudo scheme. This makes retry always try the DB first. If DB lookup fails, http URIs are mapped to a http+file pseudo scheme (rule 3), else to the original URI as is (rule 4). If the http+file pseudo scheme is not found, the http uri is restored and tried as is (rule 5).

Besides the url parameter, a retry query can have other parameters queryDB and cacheDB with boolean values ( true or false ). With values queryDB=false and cacheDB=false , the service does not look up rdb+ prefixed location mappings and does not cache the result of a successful query in database cache. The default values true let TF use its DB cache. The parameter notry turns off trying location mappings (but builtin pseudo schemes tdb+ and rdb+ work).

Query parameters format and encoding are recognised, but they are not passed through location mappings to subsequent query engine calls. Their format and encoding are controlled by location mappings. For instance, the query pattern url in the lm:mapping above might specify query?format=XHTML&encoding=TF-16 .) If format and encoding are not specified in the query string, a TF query service engine uses defaults set in its TF properties file; failing that, compile time defaults.

Query parameter q allows specifying the query string in the query uri. (This parameter is used in broadcasting.) Query parameter sites=false prevents broadcasting.

The TF location mapping retry facility constitutes a simple-minded RESTful style implementation of a TermFactory repository network . Location mappings can be used to redirect a query for a given uri to another TF instance. Since the mapping mechanism is fully general (applies on the level of TF urls), it also applies to repository uris listed as the dataset of a TF ontology query in TF properties .

TF filename mapping conventions

Retry resolves relative file urls of form file:foo to $TF_HOME/foo. This convention helps express locations relative to TF_HOME in settings files like tf.default.properties . The unadvertised command retry absolve file:foo shows what path file:foo resolves to against current setting of TF_HOME.

Some TermFactory resource files collected in the $TF_HOME/etc directory may need to be web accessible at some designated public url. Here is one convention for doing so. Assume a Tomcat server holding the TF web services also serves the TermFactory webapp. This is true of the current TermFactory home server http://tfs.cc and is the likely default in other TF installations. (If not, just use some other suitable target for the mappings.) Put whatever resources need publishing under the TermFactory url in the TermFactory webapp's root directory $CATALINA_HOME/webapps/TermFactory/ .

The TermFactory root server is at http://tfs.cc . The ontologies that it maintains are at http://tfs.cc/owl/ . In general, the default place for the ontologies held by a TermFactory site http://site is http://site/owl/ . Localization files for an ontology may be elsewhere, at a site that needs to provide a given localization language for an ontology. A site may mirror another site locally and map the remote site's urls to (more) local ones in its location map file.

It may be also useful to have a conventional site relative location mappings for some locations. We may propose the following conventional pseudo schemes as defaults in a site location-mapping.n3 :

[] tf:mapping [ tf:prefix "site:" ; tf:altPrefix "http://site:8080/TermFactory/" ] . # site: is mapped to site TF webapp root [] tf:mapping [ tf:prefix "home:" ; tf:altPrefix "http://site/" ] . # home: is mapped to site TF document root

To recapitulate: file $TF_HOME/foo can abbreviated in TF properties or XHTML writer parameters as file:foo . Given location mappings as above, a file published for external use at the site TermFactory webapp root http://site:8080/TermFactory/foo could be abbreviated as site:foo . A file at the home site document root, say http://site/owl/foo , could be abbreviated as home:foo .

apache2 URL rewriting

This section describes how to use apache2 URL rewriting to access a TF retry request for a TF concept like http://tfs.cc/ont/English , that is, a simple url like the first one below can abbreviate a long one like the second.

http://localhost/ont/English http://localhost:8080/TermFactory/query?uri=http%3A%2F%2Ftfs.cc%2Font%2fEnglish

It is enough to turn on the apache rewrite2 module and insert one rewrite rule. Assuming an out-of-the-box apache2 installation, the following addition to the default virtual host definition file etc/apache2/sites-enabled/default should do the job. (The ellipsis stands for the pre-existing content of the file. You need to restart apache2 after the changes.)

<VirtualHost *:80> ... # to rewrite incoming urls as calls to TermFactory query service #turn rewrite engine on RewriteEngine on # log rewrites to error.log (optional) RewriteLog /var/log/apache2/error.log # log level (optional, the default is 0 meaning no logging) RewriteLogLevel 9 # redirect a uri of form ont/English to TF query uri RewriteRule /(.*)/(.*)$ http://localhost:8080/TermFactory/query?url=http%3A%2F%2Ftfs.cc/$1%23$2 [B] </VirtualHost>

Note that we cannot map from the original ontology uri containing a fragment identifer using url rewriting. The optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a http GET request. That is to say, a http client does not send the fragment to the server, it requests the whole url and only looks for the fragment in the response at the client end. Since the fragment is not part of the GET http request, apache2 mod_rewrite cannot see or capture the fragment in the redirection match. For apache server end redirection to work, fragment identifiers must get rewritten into resolvable URLs at the client end. In the simplest case, all it takes is a replace of # with / in a TF uri before asking for it from a repository. These two forms of URI are then treated as aliases for the same resource in TF. (For discussion see here .)

A silly client side workaround to make hash vocabulary point to entry pages instead of document locations is this. Place the following index.html file at the directory indicated by the URI before the hash. The index file lists all the resources with that uri prefix naming them as anchor locations. Then use client side javascript to redirect the hash locations to the corresponding entry files when the index file is loaded.

<html> <head> <title>Concept instance index</title> <script type="text/javascript"> function ShowHash() { // alert("fragment ID is " + document.location.hash); if (document.location.hash) { page = document.location.hash.substring(1); if (page) window.location = page; } } </script> </head> <body onload="ShowHash()"> <h1>Concept instance index</h1> <ul> <li> <a href="ctryCode>ctryCode</a> </li> </ul> </body> </html>

The XML Resource Directory Language RDDL proposes a more general approach to this problem. RDDL is an extension of XHTML designed to allow both human readers and software robots to find any sort of resource associated with a particular namespace. Instead of putting one thing at the end of a namespace URI, RDDL puts a document there that lists all the machine-processable documents that might be available. An RDDL document identifies each related resource by a resource element in the http://www.rddl.org/ namespace, which is customarily mapped to the rddl prefix. This element is a simple XLink (that is, it has an xlink:type attribute with the value simple) and its xlink:href attribute points to the related resource. Furthermore, the xlink:role attribute identifies the nature of the related resource and the optional xlink:arcrole attribute identifies the purpose of the related resource. An optional xlink:title attribute can provide a brief description of the purpose of the link. RDDL is not related to GRDDL , specs for reading RDF off of X(HT)ML documents using XSLT.

Tomcat URL rewriting

To make a Tomcat web application redirect URLs it receives, the Tuckey rewrite filter can be installed. First, the filter jar urlrewrite-3.2.0.jar is placed in the webapp's WEB-INF/lib folder. Second, the rewriter is set up by adding the following element in the webapp's deployment config file web.xml inside the webapp element:

<filter> <filter-name>UrlRewriteFilter</filter-name> <filter-class>org.tuckey.web.filters.urlrewrite.UrlRewriteFilter</filter-class> <init-param> <param-name>logLevel</param-name> <param-value>WARN</param-value> </init-param> </filter> <filter-mapping> <filter-name>UrlRewriteFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping>

Third, a file of url rewrite rules urlrewrite.xml is placed in the WEB-INF directory of the web application.

Formats

This section documents data formats handled by TF.

Semantic Web file formats

There are a variety of formats to represent a RDF graph as text. The normative (standard) syntax for RDF is RDF/XML , an XML document format for RDF triples. Unfortunately, XML makes RDF look more complicated than it is. Moreover, there is no canonical or normal form for RDF, so different tools may generate different (though equivalent) serialisations for the same RDF. (For a remedy, see below .)

A relatively human-friendly syntax for RDF is Turtle. N3 has several features that go beyond a serialization for RDF models, such as support for RDF-based rules. Turtle is a simplified, RDF-only subset of Tim Berner-Lee's Notation 3 . Here is the result a TF query around the concept of ISO country codes in Turtle format.

Show/hide Turtle

The document starts with a long list of namespace abbreviations (prefixes). Turtle RDF triple terms are separated by whitespace, except quoted strings can contain whitespace. Minor punctuation can be used to fold together similar triples as follows. Triples that only differ by object can be written by writing the common subject and predicate followed by a comma-separated list of the objects. Triples that share subject can be written by writing the common subject followed by a list of predicate-object pairs between semicolons. Each group ends in a full stop.

In Turtle, (absolute or relative) URIs are surrounded by angle brackets. Directive @base can be used to resolve relative URIs against given base URI. By default, the base is the URI of the document. In addition namespace prefixes can be defined with directive @prefix . Turtle base and prefix work the same way as corresponding XML attributes explained below.

The Turtle format handles unabbreviated non-Latin IRIS fine. In Turtle, character encoding problems are avoided by foregoing prefixes and keeping resource names as unabbreviated IRIs (e.g. <http://biocaster.nii.ac.jp/biocaster1#鳥インフルエンザ-N>). Turtle is more restrictive about qualified (prefixed) resource names. More precisely, Turtle qualified names allow XML qualified name ( QName ) characters except dot \u002e. TF3 encoding can be used for encoding non-ascii URIs in Turtle as prefixed resource names.

Blank nodes can be written with prefix _: followed by an arbitrary ID. Alternatively, a node can be written with a matching pair of left and right brackets in front or around its properties: [] :property :value or [ :property :value ] .

Literal strings are surrounded by single or triple double-quote characters (triple quotes include newlines inside them). The datatype of a literal is written after the literal linked to it with two carets (^^).

A weakness of RDF/XML for multilingual work is that there is no provision for coding property names containing reserved characters. The only RDF/XML representation for property names is XML name QName ), which necessitates the use of character encoding for non-Latin property names. Plain uri-references in percent encoding cannot be used as such because percent is not allowed in QNames. (see discussion ). Our solution is to use TF3 encoding . A more general solution would be to extend RDF/XML with an <rdf:Property rdf:resource=URI> element analogous to <rdf:Description rdf:about=URI> for triples. Compare also CURIEs . CURIEs are compromise between QNames and URIs designed for use in XML attribute values.

Below is an example of the TF entry for the concept of ISO country code in RDF/XML format.

Show/hide RDF/XML

The root element (tag) of the document is rdf:RDF, and the content is a list of rdf:Description elements, describing resources (rdf nodes). Ontology namespace prefixes are declared by the xmlns attributes. Namespace prefixes only help abbreviate uris. They make it easier to write and change uris textually should the namespace change. They are local (so they can change between rewrites) and have no fixed meaning (except for prefixes reserved by w3c, such as xml, rdf, rdfs, owl ...) The empty XML namespace prefix xmlns: defines bare XML element and attribute names.

The xml:base URI attribute sets the base URI for resolving relative RDF URI references. By default, the base URI is the URI of the document. The xml:base URI applies to all RDF/XML attributes that deal with RDF URI references: rdf:about, rdf:resource, rdf:ID and rdf:datatype.

The node described by a rdf:Description is identified by an rdf:about, rdf:ID or rdf:nodeID attribute (if at all). A rdf:ID attribute on a node element can be used instead of rdf:about. It has the hash character built in, so that rdf:ID="name" is equivalent to rdf:about="#name". While there can be any number of descriptions rdf:about a given resource, rdf:ID is an XML ID, so it can only appear once in the scope of a given xml:base.

Blank nodes (anonymous nodes which have no resource URI) can be given an xml document local ID with attribute rdf:nodeID. This is needed when same blank node is referred to more than once. The node ID is local, meaning it may change between rewrites of the same document.

To recap: XML entity names can be used to abbreviate URIs in XML attribute values. XML namespace prefixes are used to abbreviate URIs in element names. xml:base attribute allows truncating absolute URIs into relative ones in its scope. These are all XML document related abbreviatory conventions to avoid writing URIs in full, nothing to do with the RDF graph or OWL ontology being described.

Here is a representative beginning of an archetypal OWL RDF/XML ontology document :

<?xml version="1.0"?> xml document declaration
<!DOCTYPE rdf:RDF [ <!ENTITY wn "http://www.ontologyportal.org/WordNet.owl#" > ... ]> rdf document type declaration, containing entity definition for prefix wn for use inside xml attribute values
<rdf:RDF xmlns="http://www.ontologyportal.org/WordNet.owl#" xmlns: defines the namespace for plain (unprefixed) xml elements
xmlns:wn="http://www.ontologyportal.org/WordNet.owl" xmlns:wn defines the namespace for xml element or attribute names prefixed with wn:
xml:base="http://www.ontologyportal.org/WordNet.owl" xml:base resolves (completes) relative URIs in its scope
>
<owl:Ontology rdf:about=""/> Ontology URI is xml:base (plus "" i.e. nothing). xml:base defaults to filename when not present.
<rdf:Description rdf:ID="WN30-101644373"> Description URI is xml:base plus "#" plus ID
<wn:word rdf:resource="#WN30Word-tree_frog"/> Property URI is xmlns:wn value plus "word", object URI is xml:base plus attribute value
<word rdf:resource="&wn;WN30Word-tree-frog"/> Property URI is xmlns: value plus "word", object URI is wn entity value plus attribute value
</rdf:Description>

TF ontologies typically contain resources from many different namespaces. Resources in them are identified by explicit prefixes or entity references instead of relative URIs. TF ontology URIs identify documents. The document/ontology URI is not related to the URIs of the resources defined inside the ontology. For instance, TF Schema ontology uri is http://tfs.cc/owl/TFS.owl, but no other resources in it have the same prefix.

RDF/XML can be abbreviated in various ways (see the standard ). A variant format called RDF/XML-ABBREV groups and nests triples sharing subject, predicate, or object nodes inside one another much the same way as Turtle does.

long form abbreviated comment
<some:property> <rdf:Description rdf:about="&my;Item"/> </some:property> <some:property rdf:resource="&my;Item"> object node abbreviated as attribute
<some:property> <rdf:Description> <some:otherProperty/> </rdf:Description> </some:property> <some:property rdf:parseType="resource"> <some:otherProperty/> </some:property> description node abbreviated as parseType attribute
<some:property rdf:parseType="resource"> <some:attribute>literal</some:attribute> </some:property> some:property some:attribute="literal"/> blank node with literal abbreviated as attribute
<rdf:Description> <rdf:type rdf:resource="&my;Class"> </rdf:Description> <some:Class/> type abbreviated as element name

Besides Turtle and RDF/XML, OWL has a number of formats of its own. OWL 2 has a normative functional-style syntax . These OWL specific formats are not used in TF for now. Nothing prevents using them with third party tools that support them. There is an online converter between different OWL syntaxes at http://owl.cs.manchester.ac.uk/converter/restful.jsp .

TF entries

TF entries

One of the most striking differences between traditional approaches to terminology and TF is the absence from TF of one fixed notion of a terminology entry as an informgation container . The semantic network metaphor for information management adopted in TF is dual to the container metaphor inherited to traditional terminology management from hierarchical databases and more recently, XML. The container metaphor comes from physical media like paper or magnetic tape. Containment among convex objects naturally forms tree structures. In a rooted directed tree, it makes sense of talk of nodes as bigger elements containing smaller elements. In an undirected tree or graph, all nodes are equal, any node can be taken as root or focus. Nodes do not contain one another, rather, they are visualised as dots connected by links. RDF/OWL graphs are not rooted. The serialisation of a semantic network in RDF/XML need not respect connectedness. Information concerning a given node may be distributed freely among disconnected descriptions in an RDF/XML document.

This allows a clearer view of the contrast between the disciplines of terminology and lexicography. Terminology is concept oriented, lexicography is expression oriented. Orientation has to do with the choice of origin and order of traversal of the ontology graph about it.

Difference beween terminology and lexicography

  • choice of origin
    • concepts (e.g. Country)
    • expressions (e.g. "China")
    • terms (e.g. 'China (the country)'
  • layout
    • concept oriented: Western terminology
    • expression oriented: Lexicography
    • term oriented: Contrastive lexicography

Show/hide XHTML concept oriented example

Show/hide XHTML expression oriented example

Show/hide XHTML term oriented example

  • sanaston yksiköt
    • käsitteet (esim. maa)
    • ilmaukset (esim. "Nauru")
    • terms (e.g. 'Nauru (maa)'
  • lähtöyksikkö ja esitysjärjestys
    • käsitelähtöinen: Wüster
    • ilmauslähtöinen: sanakirjat
    • termilähtöinen: Kudashev

Termitehdas tukee kaikkia em. työskentelytapoja.

A naive term list enumerates terms one by one, listing all properties associated directly or indirectly to each in one go, without distinction: properties related to its referent(s), lexical properties, and properties related to the term as such. This shallow end of the term pool is what we call TF Lite. At the deep end, the commonalities between terms that share meaning is reified to a separate entity, the shared meaning, and the shared semantic properties associated to that, and the meaning associated to the term. Symmetrically, the commonalities between terms that share lexical and grammatical properties are collected to a new entity, the shared expression. Intermediate cases are obtained by making or not making the term/concept or term/expression splits individually. The aim is non-redundancy: every property of a resource must provide a fact about the resource, the whole resource, and nothing but the resource, "so help me Codd".

Nothing is absolute here: terminology is an applied science. What seems one and the same concept in one analysis can get refined into many later. By parity, the category of expressions allow refinement: what counts as an expression for one purpose can become a term with a meaning under further analysis.

Orientation, literally, has to do with the orientation of the rdf graph around ("about") a node in the graph. A given layout is determined by the choice of root and the order of traversing the edges of the graph to obtain a spanning subtree that constitutesa tree formed hierarchical representation of it, and, eventually, the choice of serialisation of the tree into a one dimensional stream of characters.

It is a hallmark of the TF approach to separate these concerns from the real objects terminology is about, namely terminological resources: instances, classes, properties, and statements connecting them. At the same time, the TF abstraction of terminological content from any fixed grouping or ordering of data leaves room for different mind sets or approaches to terminology work. Such approaches can be seen as different choices of granularity and orientation. Different choices may fit different tasks.

The Western standard normative approach to terminology founded by Eugen Wüster, codified in ISO terminology standards, aims at standardisation and harmonisation of international terminology. This starts with language independent concept analysis, where the expert community agrees on the referents of special language terminology, whatever they are called. In standardization, choices are made of standard expressions to designate the referents in each language. In harmonization, national differences are ironed out. The Western view of a well-behaved term comes quite close to the TF Full profile of the TF schema, depicted above .

Traditional descriptive lexicography distinguishes between expressions (lemmas) and terms (word senses). A traditional lexicographical entry is rooted in a lemma and enumerates the senses of the expression in some order usually based on partial similarity of meaning or usage of the senses. A bilingual dictionary entry is rooted in a target expression and lists target language expressions arranged in a similar way by shared or at least similar senses of the source and the target expressions.

Further approaches are possible. A term oriented approach (Kudashev) can wind out the graph starting from a term as root and enumerate other terms which the root bears selected (e.g. semantic) relations with, in the same language or in other languages. Such a contrastive layout can be particularly useful for translators.

This freedom from fixed structure offers both advantages and disadvantages considering format conversion. It can make conversion toward TF easier, because the target format is relatively free; no particular topology need be followed in converting to TF, so conversion need not change the source topology. On the other hand, it means that conversion from TF to more rigid formats can involve search (like queries from relational databases). A query language supported by a reasoner can be of much help here. For one thing, just because RDF/OWL serialisation to XML is not constrained, the resulting topology is rather unpredictable. This makes applying XML tools like XSLT to RDF/XML fragile. A standard serialisation of TF to some (more) fixed XML format, say TBX, is one solution to this. Another one is to divide and conquer the problem using TF profiles .

The TF XHTML format allows choosing between concept-oriented, term-oriented and lemma-oriented layouts.

TF terms can be distributed through the collaborative interfaces, they can be queried directly from the repository system, or they can be converted to other existing formats and distributed through existing distribution channels (electronically, in print, by phone etc.) Formats handled by TF will include at least the LISA TBX terminology exchange format, the commercial SDL MultiTerm xml export format. Further formats can be added on demand.

TF formats

TF formats

Termitehtaan tiedostomuodot

This section documents data formats native to TF.

XHTML

This is a XHTML format for viewing and editing of TF entries on (X/HT)ML platforms. The format is not limited to TermFactory, but it can be used to view and edit any RDF. The layout of the XHTML document can be customised with a template written also in RDF.

This format is implemented as a TF model to XHTML writer/converter TF2XHTMLWriter and the converse XHTML to TF reader/converter XHTML2TFReader. It represents TF models in the form of a sorted list of tree-structured concept-oriented entries. The visual rendering can be modified using css styles. The XHTML format is really just an alternative syntax notation for TF models. Like the other readers the XHTML reader loads OWL imports if readAll is on.

The XHTML format document type is XHTML+RDFa where the TF model triples are coded as RDFa annotations. From this format, the TF model underlying the XHTML entry can be extracted using a standard RDFa distiller . RDFa abbreviates URIs using compact URIs or CURIEs (roughly, URI abbreviations resembling qualified names in square brackets).

The settings that control the output of the XHTML writer are explained below. Here is a summary:

XHTML template=<URI> template to define entry structure root=<URI> root filter (list of instances/classes to include in entry) schema=<URI> bridge schema to map user ontology to template active=<URI> active ontology that gets edited original=<URI> original ontology to show what got added edits=<URI> edited ontology to show what got deleted locals=<URI> localization vocabulary lang=<ISO langcode> localization language links=<URI> hyperlink mapping

The many parameters of the XHTML format give a lot of choices about displaying TF content as a webpage. On the other hand, failure to specify the right options may spoil the looks badly, in the worst case prevent anything at all from showing up. Fortunately it is not necessary to specify parameters explicitly unless one wants to. TF site wide defaults can be set in TF properties. . For more convenience, the XHTML writer writes the options it used in the XHTML document head element as HTML meta tags. Correspondingly, the XHTML reader looks into the header and uses any values it finds there, and writes the parameters it uses in turn as annotation properties into the model it constructs. Here is an example of an XHTML parameter header:

<head> <meta content="text/html" http-equiv="content-type"/> <meta content="TF2XHTMLWriter" name="generator"/> <meta content="http://tfs.cc/TermFactory/etc/templates/sem.ttl" name="template"/> <meta content="http://tfs.cc/sem/Meaning" name="root"/> <meta content="http://tfs.cc/owl/wn/TFwn.owl" name="schema"/> <title>TF wn30:synset-entity-noun-1</title> <link href="/css/tf2xhtml.css" rel="stylesheet" title="server" type="text/css"/> </head>

Here is the corresponding RDF graph written into the model shown in TURTLE:

[] rdf:type meta:Entry ; meta:generator "TF2XHTMLWriter" ; meta:root <http://tfs.cc/sem/Meaning> ; meta:active <http://tfs.cc/owl/wn30/wn30entry.php?e=cat-noun-1.ttl> . meta:schema <file:owl/wn/TFwn.owl> ; meta:template <file:etc/templates/sem.ttl> .

If there are more than one meta triple for a parameter in an entry document, the reader chooses one at random.

XHTML templates

(version 2.2) A serialisation of an RDF graph, such as a TURTLE file, constitutes a spanning forest of trees whose nodes are RDF nodes and arcs RDF properties. Any TF XHTML entry structure is a variant of this format. XHTML entries are nested html lists of entries, properties and values. Different entry layouts come about by choosing the roots of the forest, the choice and order of properties, the order of property values, and the depth of nesting. With template explicitly set to an empty model, the XHTML writer produces a property tree isomorphic with TURTLE output, with triples sorted and grouped alphabetically by subject, predicate and object. The default layout in etc/template.ttl is a link to the concept layout ont.ttl . It chooses concepts as roots and nests properties down to the level of terms and expressions. Properties are sorted concept properties first, followed by definitions and terms, followed by the rest. This layout as well as alternatives to it can be defined using RDF format template files. This example shows that template files are technically OWL ontology files and as such can import other such files. The imported template is shown further below.

Show/hide top level of concept entry template

The above mentioned four parameters for entry layout: choice of root/s, choice and order of properties, and depth of nesting are controlled by the template. The choice of roots (what to count as an entry) depends on the template and the schema. If there is no template, all resources count as entries. All triples in the model are printed in a plain flat list format. If there is a template model, the XHTML writer looks in it for a template triple like meta:Entry meta:hasSubClass ont:Concept . The first triple found with this subject and predicate is used to set the entry root class (here, ont:Concept ). The class of subjects to include in the write can be overridden by an explicit root parameter. If root parameter is a class mentioned in the template or subclassed to one by the model or schema, it will be used to filter entries. Failing that, the root parameter is taken to be list of particular instances. This is the default assumption if more than one root is given, or the root parameter is not typed by the model, schema, or template as a class. If the root resource cannot be typed to some class mentioned in the template, the template entry root class is assumed.

Only schema reasoning is used to determine entries at write time, because reasoners tend to be slow with type realization on large instance bases. Schema reasoning is faster but not complete. There may remain instances in the data that are not listed as entries, though they could be proved to be members of the entry root class from the model and schema together, using type realization and enough time.

The choice of properties and the level of nesting is controlled by matching the input model with the structure of the template. A property of a resource is included into the listing if it matches a property of the current template node. If there is no schema, model and template properties are matched by identity. If there is a schema, there is a match if the model property is a subproperty of a template proprty by the schema. The first match is taken, there is no backtracking.

The reserved wildcard property meta:property matches all properties not otherwise matched. The wildcard object owl:Nothing matches nothing, so a a matching triple with this object excludes a property from the listing. The wildcard subject owl:Thing matched any node. It is tried last if there is no other match.

The boolean property triple meta:Entry meta:invent "true" . in a template model tells the writer to invent inverse property names for properties missing inverses, by suffixing -InverseObjectProperty to any such property name. This makes sure those triples where the subject resource only appears "passively", as an object, are included into the listing.

The order of properties is based on the reserved template sort property meta:cmp . By default, properties are sorted alphabetically by URI. A triple like rdf:type meta:cmp "1" . tells the XHTML writer to use the string "1" to sort property rdf:type by instead of the URI. Since numbers are sorted before letters, this causes rdf:type to get sorted first. NOTE! meta:cmp values sort as strings, not numbers, so 100 sorts before 9.

We may want to sort property values by content. For instance, terms should be sorted by language code. This can be accomplished by a template like the following.

term:Term meta:cmp _:lang1 ; term:hasDesignation [ exp:langCode _:lang1 ] ;

This template tells that terms are sorted by the language code of the designation, for the value of the sort property equals the langCode property of the designation of the term.

A node template can be given property meta:sortBy to control how properties of a node matching the template are sorted in printing. Currently, three values are implemented. The default value "property" sorts triples by property, as explained above. Value "value" sorts properties first by value, then by property. All properties whose values have a sort template as explained above are sorted first, and ties are sorted by property. Value "mixed" makes use of "hierarchical" sort properties whose value have a "decimal" dot as in

rdf:type meta:cmp "1" . term:contentOf meta:cmp "2.1" . term:referentOf meta:cmp "2.2" . term:hasExplanation meta:cmp "3" .

In the mixed sort mode, properties are first sorted using just the first half ot the sort property value before the decimal dot (if any). In the above example, "1" gets sorted before "2.1" and "2.2", but the latter two are tied. After that, the sort goes on as in the previous case: first by value, and finally by property again (this time using the entire sort property string as usual.) For instance, the above sort property settings cause type properties to get sorted first, after that, definitions and terms are first grouped by language code, and within each group, definitions are sorted before terms. This sort order comes close to TBX term entry structure.

For more control, blank leaf nodes inside anonymous templates can be used as bindable variables to control printing of nodes in the model graph. Blank leaf nodes in an anonymous template get bound to corresponding nodes in the model graph during the printing of an entry. A node is only printed if it satisfies the bindings of the template instance created during the traversal. Thus for instance, in the sample lexicographic template exp.ttl , definition texts get printed out according to the language of the expression, because the _:lang node gets bound to the language of the expression when exp:Expression is traversed, and this binding is checked from inside _:Text at the time _:Definition is entered.

For maximum versatility, reasoning can be used to alter the topology of the model. For instance, if one wants to associate terms in a given language directly to definitions in the same language, one can add to the input model some property connecting them. For instance, sign:synonymWith can be used to link terms directly to definitions). New intervening node types to group items can also be added without changing the xhtml writer, for instance, to implement the TBX auxiliary notion of language set.

Other than the cases described above, template resource names are not matched against resource names in the model. (They might, but so far such fine tuning has not been called for.)

For decorating the XHTML layout, there is also a template property meta:css that can be used to add a CSS class attribute to a property or value. For instance, the triple rdf:type meta:css "sem" . marks the type property as a semantic property in the XHTML. By the default stylesheet tf2xhtml.css , it gets color blue.

When there is a schema file specified, (the only or a random) rdf:type property of resource meta:Entry in the template is used as the superclass to select entries from the input model. Entries that are explicitly of the given class or whose explicit types can be inferred to belong using the given schema it are included. Type reasoning is not applied to the input model, as this can take very long with large input files.

The template to write XHTML entries with can be specified explicitly. The default value is etc/templates/ont.ttl , the model template for TF concept based entries. The default value can be changed in the TF configuration file tf.properties . Template files can include other template files using the owl:imports property. (For examples, see etc/templates.) The command line parameter root can also be used for specifying the output root filter from command line.

The TF XHTML format allows choosing between concept-oriented, term-oriented and lemma-oriented layouts using the entry parameter with values in ont|term|exp . The default is a concept-oriented entry. For examples, see the section on entries . In order to implement some desired layout of the data, the topology of the model to write may need to be enriched (say, using a reasoner) so as to create more connectivity into the graph. For instance, in Legacy terminology, a term and a definition are connected through a shared concept. To finish this section, here is the default template for showing concept entries.

Show/hide body of concept entry template
XHTML entry schema

In absence of a XHTML schema, the XHTML writer follows the given template literally. Properties in the template are matched literally with those in the model to write. If an explicit schema ontology is given, matching is subject to the schema. For instance, WordNet entries can be written with the generic meaning based sign schema etc/sem.ttl as template parameter if one uses the TF WordNet schema owl/wn/TFwn.owl as the schema parameter:

rewrite template=etc/sem.ttl schema=file:owl/wn/TFwn.owl io/entity.ttl XHTML > io/entity.xhtml

The schema parameter thus allows the same template to match a variety of different ontology formats. It is enough for the schema to subsume properties used in the model under the properties mentioned in the template. (For efficiency, Some TF Schema inverse relationships have been hardcoded into the writer, such as term:hasReferent/term:referentOf, term:definitionOf/term:hasDefinition . These settings cannot be overridden by a supplied schema parameter.)

XHTML active ontology

The TF query facility and the editor are designed to allow editing just a part of a larger model. Separating the edited model (the subset returned by a query) from the active model makes it possible to edit selected parts of a large ontology without dragging all of it to the editor.

A terminology query can produce content from several different ontologies, also from different repositories through relays. The information obtained from the different sources may be crucial for understanding what changes are needed. At the same time, edits should only be done in the ontology currenty under active development, one to which the current editing team has edit permissions. Some solution like the Protege facility of choosing the active ontology among the ontologies shown (greying the rest) is indicated. OWL 2 has only just begun to support per-statement source annotations, using a variant of the RDF reification (quad) technique. It takes four more triples to annotate a triple in the ontology. Pending a more efficient technique for source indication (Ontotext has one), it seems easier to do a second query for editable content from the active ontology and use that to single out the editable elements at the edit interface level.

For this purpose, the TF jena model XHTML writer can be given as parameter the active model. Given this parameter, the writer marks those triples that come from the active model with attribute class="editable" and those that do not with class="readonly" . An editing tool can then restrict editing to the active ontology, so that a subsequent save of the edits can be included in the right ontology. (Compare the forthcoming HTML 5 contenteditable attribute .)

XHTML original and edits

The original and edits ontology parameters can be used to print differences between original and edited ontology. Entries and triples missing from the ontology given by the original parameter are tagged with css class "deleted" and shown by the default stylesheet tf2xhtml.css stricken out. Entries and triples missing from the ontology given by the edits parameter are tagged with css class "added" and shown underlined. For an example, see section on TF revision control .

XHTML localization

The TF to XHTML writer can take as parameters a localisation ontology and language code. It then prints out localised labels for property names and values whenever it can find terms in the localisation ontology tagged with the requisite language code. Conversely, the XHTML to TF parser can take the corresponding parameters and use them to map the localised labels back to TF entity URIs.

The TF to XHTML writer and its converse, the XHTML to TF parser are parametrised with a localisation model and lang code. Given these parameters, a TF model is serialised in XHTML with property and value URIs labeled with strings taken from the localisation model and language. Conversely, when an edited XHTML document is parsed back into TF, labels used in the XHTML document are mapped back to TF URIs by looking up corresponding localisation terms from the localisation model.

The localization file to the XHTML writer/reader can also be a json format file as produced by a Perl + SPARQL script localize below. The localization file's full URL can be given as a parameter. The json file extension must be .json . A location for localization files can be specified with TF option TF_LOCALS in etc/tf.properties . The default location is where the ontologies are, file:owl/ . If a relative filename of form *.json is explicitly given, it is resolved against the value of TF_LOCALS . If no localisation file is given, then the value of TF_LOCALS is used as the URL. If this URL is a directory and localization lang is (say) fi, then filename fi.json is resolved against that directory.

XHTML links

The TF to XHTML writer adds hyperlinks to resources mentioned in the document. The target of the link is by default the URI of the resource. The hyperlink target can be remapped with a location mapping file. The default is etc/links.n3. It can be changed with option TF_LINKS in configuration file etc/tf.properties or with a switch called links . Link mappings are standard TF location mappings. The link mapper applies mappings until it runs out of applicable mappings or TF_HOPS is exceeded. Below is an example of a link mapping that maps SUMO wordnet URIs to the TermFactory query service.

[] lm:mapping [ lm:prefix "http://www.ontologyportal.org/WordNet.owl#WN30-" ; lm:altPrefix "http://localhost:8080/TermFactory/query?format=XHTML&schema=http://tfs.cc/owl/wn/TFwn.owl&uri=http://tfs.cc/wn/WN30-" ] .
XHTML CSS stylesheet

The XHTML writer links the XHTML document to a CSS script named tf2xhtml.css . The link points to the TF server root location /TermFactory/css/tf2xhtml.css (The writer also adds an alternate stylesheet link titled "local" that points to the location of the XHTML document itself, to use as a fallback when server root is not accessible.) This css file can be customised at will. The output template provides a meta property meta.css that let the template instruct the writer to provide any RDF property element it writes out with one or more CSS class attributes. Then it is easy to associate desired CSS decorations to those class attributes. Advanced CSS selectors are able to associate decorations directly to RDF properties mentioned in the XHTML, but going via XHTML class attributes declared with meta.css can be useful as an intermediate level of abstraction.

The default tf2xhtml.css has been intentionally left bland, so that the relationship of the XHTML to the underlying RDF model is not obscured. Color coding is used to set off different property types. String literals (the only text that editor users normally need to type in) are in black. Entries are in the default color maroon. Concepts (meanings) are shown in blue, terms (signs) in green and expressions (forms) in dark yellow (orange). Entries, terms, and expressions are boxed in the corresponding color. Lighter shades of green and yellow indicate definition and text fields. Editable content is strong (boldface), read-only content watered out (transparent). As a demonstration of the many things one can do with CSS alone, the sample stylesheet hides from view the term and expression URIs inside a given box until one hovers on the top of the box. CSS 'skins' are conservative in that they cannot reorder content, so it is usually still possible to edit the XHTML through the skin. CKEditor normally applies source CSS when in WYSIWYG mode, so provided CKEditor succeeds to find tf2xhtml.css , the WYSIWYG mode will show the styles.

XHTML javascript

For yet fancier control on the looks, the writer links the XHTML document to a javascript source named tf2xhtml.js at location /TermFactory/js/tf2xhtml.js . The default script activates the property list bullets in the XHTML so that clicking on any one collapses the box under it (hides the property list from view). Clicking on the bullet again expands the box again. Again, the javascript can be customised at will. CKEditor does not normally load javascript associated to a source. In the TF Editor, box collapsing/expanding in a document inside the editor is enabled with a custom js script /TermFactory/js/custom_config.js . With it, boxes in a TF2XHTML document inside the editor text can be collapsed and expanded by ctrl-clicking on the bullet.

XHTML headers

The XHTML writer writes out the parameter settings it used when doing the write in HTML header elements. Here is a sample header element.

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet href="tf2xhtml.css" title="local" type="text/css"?> <?xml-stylesheet href="/TermFactory/css/tf2xhtml.css" title="server" type="text/css"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:exp="http://tfs.cc/exp/" xmlns:exp0="http://tfs.cc/exp0/" xmlns:exp1="http://tfs.cc/exp1/" xmlns:meta="http://tfs.cc/meta/" xmlns:meta0="http://tfs.cc/meta0/" xmlns:meta1="http://tfs.cc/meta1/" xmlns:ont="http://tfs.cc/ont/" xmlns:ont0="http://tfs.cc/ont0/" xmlns:ont1="http://tfs.cc/ont1/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:owl2xml="http://www.w3.org/2006/12/owl2-xml#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sem="http://tfs.cc/sem/" xmlns:sem0="http://tfs.cc/sem0/" xmlns:sem1="http://tfs.cc/sem1/" xmlns:sign="http://tfs.cc/sign/" xmlns:sign0="http://tfs.cc/sign0/" xmlns:sign1="http://tfs.cc/sign1/" xmlns:syn="http://tfs.cc/syn/" xmlns:syn0="http://tfs.cc/syn0/" xmlns:syn1="http://tfs.cc/syn1/" xmlns:term="http://tfs.cc/term/" xmlns:term0="http://tfs.cc/term0/" xmlns:term1="http://tfs.cc/term1/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <head> <meta content="text/html" http-equiv="content-type"/> <meta content="TF2XHTMLWriter" name="generator"/> <meta content="file:etc/templates/ont.ttl" name="template"/> <meta content="file:etc/links.n3" name="links"/> <title>TF ont0:ctryCode</title> <link href="tf2xhtml.css" rel="alternative stylesheet" title="local" type="text/css"/> <link href="/TermFactory/css/tf2xhtml.css" rel="stylesheet" title="server" type="text/css"/> <script src="/TermFactory/js/tf2xhtml.js" type="text/javascript"> </script> </head>

When an XHTML meta entry is converted into RDF, the XHTML meta headers are converted into a blank node of type meta:Entry with matching TF meta namespace triples. Here are Turtle triples corresponding to the above header.

[] rdf:type meta:Entry ; meta:generator "TF2XHTMLWriter" ; meta:links %lt;file:etc/links.n3> ; meta:template %lt;file:etc/templates/ont.ttl> .

The XHTML writer writing a model back to XHTML again looks for these values for defaults. As a result, an XHTML entry can be roundtripped through RDF without having to bother about the settings. (This among other things allows caching XHTML entries in a RDF database.)

TF2XML deprecated v. 1.9

TF2XML is a simple XML layout format for viewing and editing of TF entries on XML platforms. It is supported by a TF model to XML converter TF2XMLWriter and the converse XML to TF converted XML2TFReader. First off, it represents TF models in the form of a list of hierarchical concept-oriented entries conforming to the TF Full profile. There is an XML schema tf2xml.xsd for this format in TF/io .

TF3

The TF3 (TF triple) normal form of a TF ontology aims to minimise free variation between file versions.

Then standard textual diffs and their visualisations can focus on relevant differences. In the current implementation, we added to Jena's triple reader/writer library a pair of TF specific triple reader/writers with symbolic name TF3. The TF3 format is a special case of Turtle format where statements are written one per line and sorted alphabetically. TF3 format should be readable as a rdf triple file by any Turtle reader. For details see here .

JSON

JSON (an acronym for JavaScript Object Notation pronounced /ˈdʒeɪsən/) is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is language-independent, with parsers available for most scripting languages. The JSON format was originally specified by Douglas Crockford, and is described in RFC 4627. The official Internet media type for JSON is application/json. The JSON filename extension is .json. The JSON format is often used for serializing and transmitting structured data over a network connection. It is primarily used to transmit data between a server and web application, serving as an alternative to XML.

There are several proposals for representing RDF in JSON . TermFactory currently supports the Talis specification .

SPARQL SELECT query results are produced in the Jena result set JSON serialization. .

Another defacto standard JSON format for RDF is the Exhibit JSON format from MIT Simile project. There is an online converter babel here or here .

There is no general JSON to RDF reader in TF as yet. Script io/script/json2tf wrapping Java class com.grapson.tf.rev.jena/Json2TF converts the TF json localization file format back to TF Compact.

A json localization file can be roundtripped to TF offline using the following toolchain.

json2tf file:fi.json > fi.ttl pellet4tf query -e SPARQL -q io/sparql/terms.sparql fi.ttl > fi1.ttl rewrite schema=../owl/TFS.owl create label replace fi1.ttl TURTLE > fi2.ttl localize fi2.ttl > fi.ttl

Here fi.json is a TF json localization file generated with localize. Script json2tf converts it to (practically isomorphic) TF Compact triples. Query io/sparql/terms.sparql applies OWL property chain axioms in reverse to interpolate anonymous terms. The rewrite utility generates descriptive TF URIs for the anonymous resources. To finish the roundtrip, script localize runs a localization query to get back to the identical JSON format we started from. This toolchain can also be run through the TF services. Here is a query URL to convert fi.json back to a TF Full ontology with standard TF descriptive expression and term URIs.

http://localhost:8080/TermFactory/query?url=fi.json&schema=TFS.owl&rw=relabel&f=TURTLE

TBX

TBX (Termbase Exchange format) is a localisation industry ( LISA ) standard for the interchange of terminology data including detailed lexical information.

TBX has been republished as ISO standard 30042. The framework for TBX is provided by three ISO standards: ISO 12620, ISO 12200 and ISO 16642. ISO 12620 provides an inventory of well-defined “data categories” with standardized names that function as data element types or as predefined values. ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX. ISO 16642 (also known as Terminological Markup Framework) includes a structural metamodel for Terminology Markup Languages in general. There is an online terminology ISO data category registry at http://www.isocat.org/interface/index.html or http://www.isocat.org/files/12620.html . An older listing here .

The contents of the ISO registry are a mixed bag at present, so it cannot be taken as the paragon for TF as such.

TBX is designed to support the analysis, representation, dissemination, and exchange of information from terminological databases (termbases). It is intended to qualify as a TML (Terminology Markup Language) as defined in the Terminology Markup Framework (TMF) specified in ISO 16642:2003. In addition, TBX is intended to support the extraction and merging of information from other, non-TMF-compliant, formats,although these processes may involve some information loss.

TMF (ISO 16642) is an abstract data model to describe a potentially infinite set of Terminological Markup Languages (TML), that can be expressed for the interchange of computerized terminological data using, for example, XML. TMF does not describe one specific format, but acts as a kind of meta-model based on the following elementary notions:

  • The meta-model: a unique information structure shared by all TMLs and which decomposes the organization of a terminological database into basic components as shown in figure 1. This model is in keeping with the traditional concept-oriented view of a terminological entry dating back to Wüster’s early works [Picht & Schmitz, 2001] and widely adopted in the community;
  • Information units (which we refer to as data categories): derived as a subset of a Data Category Registry (DCR, see below) as needed for a given format. This may also contain additional data categories specifically defined for the current application, which may hinder interoperability with other formats;
  • Methods and representations: the means to actually implement the TML by instantiating the structural skeleton in combination with the chosen data categories, for instance by automatically generating an XML schema for the TML. This comprises the mappings between data categories and the vocabularies used to express them (e.g. as an XML element or a database field).

The TMF metamodel defines a hierarchical entry as shown in the following figure. Its top level describes a concept, the middle tier groups terms by language, and the lower levels describe terms and their parts.

Show/hide TMF metamodel

TMF metamodel

Although TMF abstracts away from XML concrete syntax, it sticks to the notion of a tree structured entry. The structural elements do not provide any specific information from a terminological perspective, but rather contribute to the organization of the terminological entry. Specifically, structural nodes may serve two purposes: structure sharing (entries can share triples through a shared node) and property inheritance (properties such as source indications and other annotations can be inherited from a structural node). Tree is a special case of graph, so the structure sharing function of stuctural nodes can be implemented in TF too. Property inheritance can be expressed in TF also explicitly using OWL axioms and rules.

TermFactory supports two way conversions between TF ontology format and the Term Base Exchange (TBX) standard format. TBX can thus be used as an exchange format through which ontology-to-terminology-to-ontology conversions happen.

TF2TBXWriter.java is a TF to TBX converter written in Java using Jena. There are some awkward spots in the conversion where the TBX data categories fail to match ontology language semantics. In particular, traditional terminology theory and TBX have less place for the first order logic distinction between individuals and classes than TF. Setting aside such differences, the TF to TBX conversion and the TBX to TF conversion are true inverses, meaning that after a round trip, the TF3 form of an ontology file remains fixed in the conversion. (Since the TF to TBX conversion is lossy, the initial round may filter out elements that have no conversion mappings and add implicit information.)

LMF

ISO 24613:2008, Language resource management - Lexical markup framework (LMF), is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication and cultural diversity.

LMF appears to follow the same Saussurean model as TF, except that a LMF lexical entry allows many senses. This is the lexicographical notion of a lemma-keyed lexical entry.

MultiTerm

SDL (previously Trados) MultiTerm is a commercial terminology management tool that has been on the market since the 90's. It is a searchable database which allows creating hierarchically organized multi/mono-lingual term entries with user-definable fields. It is sold standalone and as a part of the SDL (previously Trados) computer aided translation package.

SDL MultiTerm has an XML export format. A schema document for the format MTF-schema.xml could be found in the Docs subdirectory of TRADOS Freelance Edition 7. The schema document, written in Microsoft's obsolete XDR schema, was converted using Microsoft's XDR to XSD converter into XML Schema language. An XML schema MTF-schema.xsd for the format is included in the TF conversion library.

Conversion

Third party ontologies can be imported to TF in different ways depending on the available formats and the character of the relationship between TF and the third party contents.

If the third party ontology (call it O3) is already in a TF native format, there is nothing to do but link it to other TF ontologies. In practice, this usually means building a bridge ontology that imports O3, TF schema, and possibly other related ontologies and statements that help bridge O3 to the TF schema. For example, an ontology of ship diesel engines can be related to TF via a bridge ontology that loads a generic engine ontology, the TF schema, and contains bridge statements that place its classes and properties inside the engine ontology and/or relate its classes and properties to TF schema.

Another example is the BioCaster epidemic ontology, which is an OWL ontology that describes diseases and their names in many languages. Though its syntax is OWL, the class and property structure differs from TF Full (it is best construed as an instance of TF Lite). Here, there are two extremes (with various intermediate mixtures), that are analogous to compilation and interpretation in computing.

The syntactic extreme is to write a conversion script that converts BioCaster classes and properties to corresponding TF classes and properties, perhaps factoring in new TF style instances, classes and namespaces. The conversion script can work at different levels of the Semantic Web language stack (OWL/RDF/XML). On the RDF/OWL level, conversion can happen by a program using some RDF/OWL API like Jena, or it can be written as a sequence of SPARQL CONSTRUCT queries. The advantage of doing an RDF/OWL level conversion is that the conversion is robust against variations in file formats. Or one can write an XSL script to transform an RDF/XML ontology file to another, TF compliant file. XSLT scripting is fast, but the disadvantage is that the conversion becomes dependent on the many variations of the RDF/XML format. The advantage of doing syntax conversion is that once converted, the third party ontology is a native TF ontology that can be queried maintained using TF tools without any intermediate steps. On the minus side, if the third party ontology changes outside of TF, a new conversion is needed.

The semantic extreme is to leave the third party ontology unchanged and relate it to TF in OWL using bridge statements and/or rules. For instance, a bridge statement could relate BioCaster property englishTerm to TF term:referentOf as a subproperty with appropriate restriction on it, saying the target term must be an English term. The advantage of this option is that less (perhaps no) conversion is needed, as the work relating the ontologies is done by the reasoner. The third party ontology can evolve freely and stay available for TF queries at any time. On the minus side, the third party ontology cannot be used in TF without a reasoner that does the bridging. If reasoning is done at runtime, this will slow down response times.

Semantic conversion using sparql queries can take the form of an OWL ontology that (recursively) imports sparql CONSTRUCT queries on one or more other ontologies. This type of conversion document gets automatically updated whenever the imported ontologies change. Although a conversion may be too complicated to do with one sparql query, by query imports it is possible to join results of several queries in parallel or in series so that the final composite outcome is the desired one. This idea of virtual conversion is discussed in the section on TF profile conversion .

A variety of ontologies and third party terminology collections in legacy formats have been imported or converted to TF form to test the TF concept. The 4M project ontology is the historical starting point of TF. The mobilite space ontology was imported as is, as an example of importing a third party OWL ontology. A fragment of the Finnish YSO library ontology (in OWL Full) was extracted as an example of a subject field ontology. A paper industry vocabulary was converted from MultiTerm format. A third party glossary of building management terms was chosen as an example of importing from a legacy format first to TBX and from TBX to TF. An example of ad hoc conversion a multilingual glossary of chemistry terms converted directly from excel sheets to TF OWL/XML using perl.

TBX to TF conversion

This section is a guide to the conversion of a terminology in the Term Base Exchange (TBX) format into TF.

(version 0.0) The converter tbx2owl.xsl is an XSL(T) 1.0 script which transforms a TBX xml document into an OWL RDF/XML document.

TBX Basic is a simpler subset of TBX. An XSL 2.0 script tbx2tfs.xsl converts TBX Basic documents to TF. The converter comes with a separate rdf/xml document tbx-mapping.rdf (by default, located in $TF_HOME/etc ) which spells out the element by element correspondences between TBX data categories and TFS. (The mapping file format is rdf/xml in order to be applicable as is both ways.) The mapping file consists of correspondence rules like

<rdf:Description> <tbx rdf:parseType="Resource"> <name>termNote</name> <type>partOfSpeech</type> <text>noun</text> </tbx> <tfs> <exp:Designation> <exp:catCode>N</exp:catCode> </exp:Designation> </tfs> </rdf:Description> <rdf:Description> <tbx rdf:parseType="Resource"> <name>termNote</name> <type>partOfSpeech</type> <text>&DUMMY1;</text> </tbx> <tfs> <exp:Designation> <exp:catCode>&DUMMY1;</exp:catCode> </exp:Designation> </tfs> </rdf:Description> <rdf:Description> <tbx rdf:parseType="Resource"> <name>descrip</name> <name>other</name> <text>&DUMMY1;</text> </tbx> <tfs> <owl:Individual> <term:hasDescription> <term:Description> <exp:Text> <exp:text>&DUMMY1;</exp:text> </exp:Text> </term:Description> </term:hasDescription> </owl:Individual> </tfs> </rdf:Description>

The tbx node of a rule describes the TBX element in terms of the element name, type and content. The tfs node describes the TF side. The top node defines the subject of the target graph. The graph under it gives the properties that in TF correspond to the TBX element. &DUMMY1; is a variable indicating content shared by both sides. other is a dummy for "other" tbx types. A match is exact if context, name, type and content all match, and a partial match if context and name match. A partial match is more exact if type matches. The logic followed by the converters TF2TBXWriter.java and tbx2tf.xsl in matching such correspondences is as follows.

  1. A given input matches a correspondence if it is an exact match to the source element.
  2. A partial match matches input except for &DUMMY1; and other .
  3. An exact match is preferred over partial matches, a more exact match is preferred over a less exact match.
  4. The first match wins if there are many.
  5. A target element is written only if context matches.
  6. If there is no target element nothing is written.
  7. Text matching &DUMMY1; in the source is substituted for &DUMMY1; in the target.

This logic supports various strategies:

  • In order to change some values and keep the rest, write exact match rules and a dummy to dummy general rule.
  • In order to keep some values and suppress the rest, write exact rules and a dummy empty target rule.
  • In order to pass values except some, write a dummy general rule and exact match empty target rules.
  • In order to suppress values except some, write a dummy empty target rule and exact match rules.
  • In order to convert values (except some) to a default, write a dummy to default rule (plus some exact rules).

The current version of the converter only handles one shared variable.

TF to TBX conversion

The TFS to TBX converter is a Jena writer TF2TBXWriter.java embedded in the TF rewrite utility . It can be run with a command line like

rewrite ctryCode.owl TBX > ctryCode.xml

The converter uses the same rdf document (by default, etc/tbx-mapping.rdf ) as the TBX to TF converter for finding conversion correspondences.

With rewrite command line switch verbose , the converter prints information about mapping rule matches:

rewrite file:Place.owl TBX verbose ... </termEntry> <termEntry id="http://tfs.cc/term/hasDefinition"> <descrip type="instanceOf">owl:ObjectProperty</descrip> <!-- *** No rule: *** http://www.w3.org/2002/07/owl# rdfs:subPropertyOf term:hasDescription --> ...

The conversion mapping between TF and TBX can be extended using bridge ontologies and a reasoner. Bridge ontologies define correspondences between TF properties and TBX data categories. Using a reasoner, properties in the TF ontology which are not covered by rules in the mapping file can be rephrased into properties which are covered there. For instance, it may be enough to define a general mapping rule for the class meta:Description and let the reasoner and bridge ontology entail the mapping from a variety of different types of description to that common class.

The general insight from our conversion efforts is that syntactic rewriting in terminology format conversion can be significantly simplified by using TF semantic conversion (RDF/OWL entailment with bridge axioms) as an intermediary. Also the conversion pipeline becomes more transparent because the TF internal conversion steps have a clear semantics.

Show/hide TF conversion

TF conversion

TF to cparse conversion

A TF terminology can be converted into multilingual lexicons for the constrained language parser/generator cparse . (version 0.0) The converter (Terminator2) is implemented in Java using the Jena RDF/OWL library.

MultiTerm2xhtml and MultiTerm2FO

XSLT conversion scripts from MultiTerm XML format to TBX and back are in the plans.

Conversion scripts MultiTerm2xhtml.xsl and MultiTerm2FO.xsl convert MultiTerm vocabularies into xhtml and FO formats. Using the Apache FOP processor, MultiTerm vocabularies get further converted into multilingual PDF (including simplified Chinese).

Conversion scripts MultiTerm2xhtml.xsl and MultiTerm2FO.xsl convert MultiTerm vocabularies into xhtml and FO formats. Using the Apache FOP processor, MultiTerm vocabularies get further converted into multilingual PDF (including simplified Chinese).

WordNet

WordNet conversion

WordNet 2.0 was converted for the w3c consortium by Mark van Assem. More recently, the conversion has been updated to WordNet 3.0 . The advantages claimed for the w3c conversion over other versions are that it is complete, uses slash URIs , provides OWL semantics while still being interpretable by RDFS infrastructure, provides a Basic and Full version, and provides URIs for words. (The Basic version provides a table of full synonyms (orthographic variants, abbreviations and such.)

Talis.com provides a sparql endpoint for WordNet 3.0 .

Variants of the following query can be used to inspect the contents of the Talis.com store:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> DESCRIBE ?inst WHERE { ?inst rdfs:label ?label . FILTER(REGEX(STR(?label),"^entity")) } LIMIT 3

Another conversion of WordNet 3.0 to OWL is available from SUMO site ontologyportal.org . This version has been linked to SUMO.

The data models of both WordNet to RDF/OWL conversions match the TF sign structure. Word, WordSense, and Synset are distinguished as subclasses of TF form, sign, and meaning. Both conversions have been bridged to TF. More precisely, wn/TFwn.owl bridges vunl WordNet to TFTop.owl, and TFwn.owl imports wn-schema-align-sumo-vunl.ttl which aligns vu.nl WordNet with SUMO WordNet.

The two conversions use different resource URIs for WordNet resources. van Assem's conversion constructs uses descriptive synset URIs constructed from representative word senses. The SUMO conversion constructs synset URIs from WordNet synset IDs. Alignment files wn-schema-align.ttl, synset-align.ttl, and sense-align.ttl are provided between the sumo and vu.nl conversions. The schema alignment is provisional, since at the time of writing, the new vu.nl 3.0 schema files are unfinished and do not match the data files provided. The provisional schema alignment prefers the 2.0 schema vocabulary used in the datafiles over namespaces used in the new schemas to minimise need of refactoring the data. The instance alignment files define owl:sameAs mappings between the URIs of the two conversions.

There are numerous further differences between the conversions. Some relations that are sense relations in vu.nl are meaning relations in SUMO (e.g. antonyms, derivatives; here the vu.nl representation is more precise). The WordNet specific word class of derived ("satellite") adjectives is merged with other adjectives in SUMO (but reflected in the wn:pertainym relation). SUMO WordNet contains irregular word forms. The SUMO WordNet.owl.rdf had some schema and character errors that have been fixed in wordnet-sumo.owl.

All of WordNet 3.0 has been translated into Finnish in the FIN-CLARIN project. For TF, the Finnish WordNet translations have been converted from xml to owl for both sumo and vu.nl versions , using xsl scripts fiwn-sumo.xsl and fiwn-vunl.xsl. The translations are in fiwn-all-sumo.owl and fiwn-all-vunl.owl, respectively . The current TF WordNet conversion uses the w3c namespace.

Show WordNet example

The conversion used the following commands. The sed script does tf3 encoding and fixes entity references.

xsltproc fiwn-sumo.xsl fiwn-all.xml > fiwn-all-sumo-raw.owl saxon -xsl:fiwn-vunl.xsl fiwn-all.xml > fiwn-all-vunl-raw.owl sed -f tf3encode.sed fiwn-all-sumo-raw.owl > fiwn-all-sumo.owl sed -f tf3encode.sed fiwn-all-vunl-raw.owl > fiwn-all-vunl.owl

The vu.nl conversion needs synset-align.ttl for vu.nl synset URIs. After the conversion, doctype entity declarations in fiwn-doctype.txt are inserted in the prolog just after the xml declaration. The namespace for the Finnish translations is provisionally http://tfs.cc/wn (set in the xml:base attribute in the xsl scripts).

The following scripts extract a sample collection of words, senses, and meanings from each the two WordNet conversions and their corresponding Finnish translations. The result can be checked in Protege by loading either pair of files and classifying the ontologies with a reasoner. (For the vu.nl conversion, also load schema-align.ttl that defines vu.nl property inverses.)

# vunl sparql --query d-nwordsynsets-vunl-1000.sparql --data ./download/raw/rdf_wordnet-synset.ttl --data ./download/full/rdf_full_wordnet-wordsense-synset-relations.ttl --data ./download/full/rdf_full_wordnet-wordsensesandwords.ttl > d-nwordsynsets-vunl-1000.ttl sparql --query d-fiwordsynsets-vunl-1000.sparql --data fiwn-all-vunl.owl --data rdf_wordnet-synset.ttl > d-fiwordsynsets-vunl-1000.ttl #sumo sparql --query d-wordsynsets-sumo-1000.sparql --data wordnet-sumo.owl > d-wordsynsets-sumo-1000.ttl time sparql --query d-fiwordsynsets-sumo-1000.sparql --data fiwn-all-sumo.owl > d-fiwordsynsets-sumo-1000.ttl

The above scripts take of the order of ten minutes to run per entry. In order to mass produce synset entries, we indexed the rdf descriptions in TURTLE format wordnet files by synset id and split the description files textually at paragraph boundaries into synset entry size files. The Finnish translations file fiwn-all-vunl.ttl and the relevant vu.nl English WordNet files are shown below with sample contents.

./fiwn/fiwn-all-vunl-ttl fiwn:fi-Akheron-joki-N rdf:type wn20schema:Word ; rdfs:label "Akheron-joki"@fi ; wn20schema:sense fiwn:fi-Akheron-joki-N_-_WN30-109186709 . ./vunl/download/raw/rdf_wordnet-synset.ttl wn30:synset-1530s-noun-1 wn20schema:synsetId 115148787 . ./vunl/download/full/rdf_full_wordnet-wordsensesandwords.ttl: wn30:wordsense-zymurgy-noun-1 wn20schema:word wn30:word-zymurgy . ./vunl/download/full/rdf_full_wordnet-wordsense-synset-relations.ttl wn30:synset-zymotic-adjective-2 wn20schema:containsWordSense wn30:wordsense-zymotic-adjective-2 .

The translations are already indexed by synset id. We used sparql to index the vunl words and senses files with the synsetId datatype property present in the synset file. These synset-indexed files were then split per synset id into over 100K entry files. This took just over an hour wallclock time.

time perl split.perl finnish.ttl synset-relations.ttl senses.ttl real 61m33.559s user 0m27.950s sys 0m31.198s

The w3c conversion of the core Wordnet predicate s/6 is documented like this.

s(Synset_ID,W_num,Word,Ss_type,Sense_number,Tag_count).

A s operator is present for every word sense in WordNet. In wn_s.pl, W_num specifies the word number for word in the synset.

The arguments of the clause are the following:

  1. Synset ID: unique number for a synset. If ID starts with 1: synset contains only nouns 2: verbs 3: adjectives 4: adverbs
  2. Word number: provides a number for the word sense within the synset (not ordered)
  3. Lexical form: a string, possibly containing a hyphen (connecting collocated words), an underscore (stands for a space between two collocated words), and escape sequences to encode diacritics;
  4. Sense type: value is one of the set {n, v, a, s, r} which stands for noun, verb, adjective, adjective satellite and adverb, respectively;
  5. Sense number: gives a number to the sense in which the lexical form is used that is unique for the sense type (e.g. there are ten different nouns with the lexical form "bank" numbered 1 to 10; there are eight different verbs with the lexical form "bank" numbered 1 to 8;
  6. Tag count: frequency of this word sense measured against a text corpus.

Each s(...) represents one word sense. All s(...) with the same ID together form the whole synset.

The vu.nl conversion maps this to:

  • Synset's subclasses: NounSynset, VerbSynset, AdverbSynset, AdjectiveSynset, AdjectiveSatelliteSynset
  • Word
  • WordSense
  • containsWordSense(Synset,WordSense) - inverse: inSynset
  • synsetId(Synset, xsd:nonNegativeInteger)
  • tagCount(WordSense, xsd:nonNegativeInteger)
  • word(WordSense, Word) - inverse: sense
  • lexicalForm(Word, xsd:string) - superproperty: rdfs:label

Ad hoc conversion to TF

As an example of conversion from a typical tabular file format (entry per row, concept and language sets per column) through TBX to TF, we converted a multilingual welfare vocabulary via TBX Basic to TF. The table-to-TBX conversion was made with a simple Perl script txt2tbx.perl .

A conversion from TF back to puls ontologies is in the plans. It will allow PULS to use ontologies developed in TF without changing the internals of the PULS system.

TF schema profiles

TF profiles harp on a theme familiar from database design and relational database normalisation, now transposed to the ontology instrument. The question is what players on the terminology field become resources, i.e. first class (= first order) entities, and which can remain virtual (classes, properties and roles). Thre are two reasons to reify some feature into a resource: if it can have properties, or if positing it makes the model smaller. An answer to this question is also implicit in the various terminology theoretical definitions of what a term really "is".

An old standard, DIN 2342 (Begriffe der Terminologielehre) defines a term as „zusammengehörige Paar aus einem Begriff und seiner Benennung als Element einer Terminologie.“ TF Full conforms to this definition.

In contrast, the ISO 1087-1:2000 definition of term reads:

verbal designation (3.4.1) of a general concept (3.2.3) in a specific subject field (3.1.2)
Note:
A term may contain symbols and can have variants, e.g. different forms of spelling.

One (perhap not the intended) reading of this definition is that a term is a verbal expression, considered in the role of a designation of a general concept. (Another ISO formulation is 'Designation of a defined concept in a special language by a linguistic expression' [ISO 1087:1990] is vague between two construals.) A literal ontological reading of the latter definition would be

term:Term rdfs:subClassOf exp:Expression , [ rdf:type owl:Restriction ; owl:onProperty term:designationOf ; owl:allValuesForm ont:Concept ] .

In this construal, term:Term is a role of exp:Expression in the technical sense of OWL. It is that subclass of exp:Expression whose members designate special language general concepts. This reading is a variety of the TF Legacy profile. On this reading, terms/expressions shared between domains are not just similar, they are the same. This is because here term:Term is not reified into an individual. An expression may have many names corresponding to its roles as designation for different concepts, but the names denote the same entity. The terms cannot go with have conflicting term properties, or conflicting term and expression properties (say, owner), since they denote the same thing.

More specifically, we can distinguish two limiting case profiles for TF schema, provisionally called TF Lite and TF Full. TF Lite is the base case, other profiles are implemented as additional axiom sets on the TF schema. TF Lite adds no axioms. Intermediate profiles differ in whether they separate concepts from terms (Legacy) or expressions from senses (Dictionary). In Legacy terminology work, expressions need not be separated from terms, since in normative monolingual terminology expressions are monosemic. In descriptive Dictionary work, where word senses are assumed to be unique (no full synonyms), there is no need for language-independent concepts. TF Full separates all three. The three-way separation is motivated in multilingual cross-domain term collections.

There is an asymmetry between the two separations. The separation of terms from expressions is fine tuning. In Legacy, a term is a role: a term IS an expression that denotes a given concept. Also TF Full allows term:Term to be a subclass of exp:Expression. Terms can stay a special case of expressions. Mixing concepts with terms causes an object-metalanguage metonymy.

A term and a concept are on different sides of the Tarskian semantic divide: the former is a piece of language, the latter a piece of the world. For a concept (class) to include another is thus different from a term (expression) to include another. Men are included in animals, but the word (sense) man is not included in (the word sense) animal . (If anything, man , as a string, is included in woman ). On pain of category error, the TF Dictionary profile must distinguish meta level entailment properties between terms to match object level denotational relations between concepts. E.g. sign:hyponymOf, sign:synonymWith, sign:meronymOf are term instance properties matching rdfs:subClassOf, owl:equivalentClass, ont:partOf in TFS concept (class/instance) vocabulary. Which vocabulary should be used between WordNet synsets depends on whether synsets are construed syntactically as (representatives of collections of) signs or semantically as instances of class Meaning. (We choose the latter.) The relation of two terms that share the referent is sign.synonymWith . It is also the relation holding between a term and and a terminological definition of the same concept. This property lets us associate a definition text to a term sharing language with it, while maintaining with terminology theory that a definition defines the concept, not the term.

Comparing Lite semantics to Full semantics, the main difference is whether meaning is treated as a first order (syntactic) relation between terms or a second order relation between classes. Technically, the distinction is between using application specific instance level properties like sign:hyponymOf or OWL properties like owl:subClassOf . The former support dealing with taxonomies of named entities. OWL class semantics provides the power of description logic reasoning: Boolean and some relational reasoning about classes defined by description not just by name.

To support the different semantics, tools will be provided to rewrite an ontology from one level to another. Among other things, this simplifies conversion to TF. Conversion libraries only need to support direct conversion to the closest subset of TF. TF specific SPARQL conversion scripts automate upgrading/downgrading between levels by automating the necessary splits/merges. TF Lite may be an easier target to convert to from contrastive terminologies of the lexicographical type. On the other hand, TF Full is more transparent to OWL reasoning. See section on TF profile conversion .

TF Lite

A TF Lite ontology does not require that the expressions, terms and concepts are disjoint. TF Lite need not split terms, concepts, and expressions into different resources. The same resource can play many roles. Since TF does not apply the unique name convention, a given resource can be given different aliases in different roles. These properties of TF Lite leave many descriptive options to play with. One and the same resource, perhaps aliased using different URIs, can play all three roles of term, concept, and expression.

Descriptive translation-oriented contrastive terminography does not start from, or aim at, interlingual concept harmonisation, but operates with pairwise term comparisons. In a TF Lite implementation of this approach, terms start out as undivided entities. At the start of the game, we collect a cluster of similar terms like fi-metsä-N_-_FI-Forest , en-wood-N_-_GB-Wood , en-forest_-_GB_Forest , ru-les-N_-_RU-Les . Instead of creating separate instances for term en-forest_N_-_GB-Forest , its expression en-forest-N and meaning GB-Forest , we just use en-forest-N and GB-Forest as aliases for the term itself. Comparisons between terms are reified by contrastive term instances which have the source term of the comparison as the referent and the target term of the comparison as the expression.

For instance, comparing English forest to wood , the latter appears to be a narrower term than the former (a wood is a small forest). This observation is reified into a contrastive term en-wood-N_-_GB-Forest . This contrastive term is marked as an approximate match, specifically, it has the property term:approximate "sub" . A sample of this approach is shown below.

Show/hide TF Lite contrastive terms

The description of the same contrastive facts in TF Full, where terms, concepts and expressions are separated, is not very different. We eliminate each contrastive/approximate term by replacing it with a new exact term whose concept is related to that of the contrastive/approximate term with rdfs:subClassOf as appropriate. To wit: if approximate term E -_R has property term:approximate "sub" , and term E_-_C is exact, then C rdfs:subClassOf R . If E_-_R has property term:approximate "super" , then R rdfs:subClassOf C . If E_-_R has property term:approximate "true" , then C and R have a nonempty common subclass in common. The best we can do about this is add a class S for the common subclass and assert S rdfs:subClassOf C. B rdfs:subClassOf R . (There is no explicit way in OWL to say that S is nonempty, short of putting named instances into it.) The previous contrastive sample converted to TF Full might look as follows.

Show/hide TF Full contrastive terms

Separation axioms

To be precise, separation of terms and concepts can mean just conceptual separation where a resource is typed as either a term:Term or ont:Concept, and it has concept or term properties according as it belongs to one or the other of the classes. Relations like term:hasReferent relate members of the former class to members of the latter class. This conceptual separation does not imply that terms and concepts are disjoint. It only concerns domains and ranges of properties and class membership. It is still consistent for a term and its referent to denote the same resource, even to have the same URI. If they are not, then we may say that the separation is strict.

A TF ontology can be tested for strict conformance to one of the TF profiles using separation axioms in LegacyStrict.owl and DictionaryStrict.owl . LegacyStrict checks that signs are separate from their meanings. DictionaryStrict asserts that signs are separate from their forms. The following Pellet queries check whether signs, forms, and meanings can be separated in ContrastiveLite.owl:

pellet consistency LegacyStrict.owl ContrastiveLite.owl pellet entail -e LegacyStrict.owl ContrastiveLite.owl

However, as was pointed out earlier, the points of the TF triad are relative, not absolute. While it makes sense to make semiotic relations like term:hasReferent irreflexive, so as to separate object and metalanguage levels, categorical separation axioms between the members of the semiotic triad are not strictly true. A special language term can be designated by a general one, like the English mathematical term for plus or minus is designated by the common English noun (sense of the English word) sign . Or a sign can be the referent of another sign; for example, a quote refers to the text without the quotes. Given no unique name assumption, a TF ontology can choose whether URIs for terms, concept instances, and expressions point to different entities. They can be identified using owl:sameAs and separated using owl:differentFrom . The following figure illustrates different ways of cutting the TF pie.

Show/hide TF top ontology

TF top ontology

TF profile conversion

Consider what the freedom from ontological commitment in TF means from the point of view of data complexity and terminology evolution. Say we get a simple list of names. We can first enter them as instances of class Term. Sometime later the names get associated with some content. As long as there is no ambiguity, we may associate the content directly to the term instances. There is no need to split terms from expressions yet because there is nothing to share. When a need arises to disambiguate the expression, we may split the shared expression off the different terms. Dually, when a need arises to share content, we can split the shared meaning into a concept and relate synonyms to that. All this can be done just in time, on demand, not ex ante just because the data structure requires it.

Say we find at some point that we need to split a vague concept into two. For instance, the English word parse can be a noun or a verb, denoting the process or parsing or its result. For other languages like Finnish, the two concepts need to be separated, because they go with different words. We can keep the vague concept that holds the shared information and create new concepts/expressions that inherit shared features from the vague concept, and add the distinguishing features to new subconcepts. If the process/result ambiguity is pervasive, it can be coded as an axiom, rule and/or template.

Show/hide concept split

parse

The TF language profiles TF Lite, TF Full, and their intermediates serve among other things to help conversion of content to TF. The idea is that a given third party terminology collection can be converted to the TF profile that is closest to the source structure. Afterwards, when appropriate, the content can be further converted inside TF to conform to another profile.

Experience with various conversions (echa and kyamk glossaries from excel tables and puls from lisp source) showed that conversion from ad hoc formats directly to TF is often easier than conversion through TBX, thanks to the non-hierarchical nature of RDF/OWL, that makes it able to flexibly merge piecemeal partial information about resources obtained from different places in the source.

An experimental MS Excel workbook macro has been written which converts a simple excel tbable of term information into RDF triples conforming to TF Lite. The triples are further stepwise converted into more complex TF profiles using TF internal ontology tools.

TF Lite does not separate concepts and expressions from terms. It is the easiest target to convert to from contrastive terminologies of the lexicographical type.

TF Full separates concepts, terms, and expressions. It is most useful with multiple-domain term collections (like TF itself).

To support the different semantics, tools are provided to rewrite an ontology from one level to another. Conversion libraries only need to support direct conversion to the closest subset of TF. TF specific SPARQL conversion scripts automate upgrading/downgrading between levels by automating the necessary splits/merges. A subtask is under way to provide a library of conversion scripts between the different TF profiles and associated validation and matching tools.

A TF Lite ontology can be converted to TF Full with a sequence of TF queries and rewrites. Consider the following mini TF Lite ontology lite.tf3 . It contains just one Lite term lite:item carrying concept, term, and expression properties indiscriminately. The property types according to TFProp.owl are given at the end of the file.

> cat lite.tf3 @prefix term1: <http://tfs.cc/term1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix term: <http://tfs.cc/term/> . @prefix exp1: <http://tfs.cc/exp1/> . @prefix exp: <http://tfs.cc/exp/> . @prefix meta: <http://tfs.cc/meta/> . @prefix lite: <http://lite.org/lite#> . lite:item rdf:type term:Term . lite:item exp:baseForm "hello world" . lite:item exp:langCode "en" . lite:item exp:catCode "V" . lite:item exp:hasOrigin [ exp:text "hello world, this is me" ] . lite:item term:status "proposal" . lite:item lite:hasProperty "anything" . exp:langCode rdf:type meta:expressionDataProperty . exp:baseForm rdf:type meta:expressionDataProperty . exp:catCode rdf:type meta:expressionDataProperty . exp:hasOrigin rdf:type meta:expressionObjectProperty . term:hasDesignation rdf:type meta:termObjectProperty . term:status rdf:type meta:termDataProperty .

The rewrite script lite rewrites this ontology step by step as described in the comments.

> cat lite #!/bin/bash # lite1 splits expressions off lite terms ./lite1 # lite2 splits concepts off lite terms ./lite2 > cat lite1 #!/bin/bash #lite1: split off expressions from lite terms #create expressions for lite terms pellet4tf query -q t2esplit.sparql -M lite.tf3 -o TF3 lite.tf3 > lite1.tf3 #rewrite blanks as resources in namespace urn:blank: rewrite remove blanks lite1.tf3 TF3 > lite2.tf3 #copy expression properties from term to expression pellet4tf query -q t2ecopy.sparql -M lite2.tf3 -o TF3 lite2.tf3 > lite3.tf3 #remove expression properties from terms pellet4tf query -q t2edel.sparql -m -M lite3.tf3 -o TF3 lite3.tf3 > lite4.tf3 #create TF uris for blank expressions rewrite label create replace in=urn:blank: lite4.tf3 TF3 > lite5.tf3 #replace resources in namespace urn:blank: with blanks #rewrite replace blanks lite5.tf3 TF3 > lite6.tf3 > cat lite2 #!/bin/bash #lite2: split off concepts from lite terms #create full terms for lite terms pellet4tf query -q t2csplit.sparql -M lite5.tf3 -o TF3 lite5.tf3 > lite6.tf3 #rewrite blanks as resources in namespace urn:blank: rewrite remove blanks lite6.tf3 TF3 > lite7.tf3 #copy term properties from lite term to full term pellet4tf query -q t2ccopy.sparql -M lite7.tf3 -o TF3 lite7.tf3 > lite8.tf3 #remove term properties from lite term pellet4tf query -q t2cdel.sparql -m -M lite8.tf3 -o TF3 lite8.tf3 > lite9.tf3 #create TF uris for blank terms rewrite label create replace in=urn:blank: lite9.tf3 TF3 > lite10.tf3 #replace resources in namespace urn:blank: with blanks rewrite replace blanks lite10.tf3 TF3 > lite11.tf3

The result of the conversion is the following TF Full ontology.

cat lite11.tf3 @prefix term1: <http://tfs.cc/term1/> . @prefix meta: <http://tfs.cc/meta/> . @prefix exp1: <http://tfs.cc/exp1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix lite: <http://lite.org/lite#> . @prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . _:_1886045197 exp:text "hello world, this is me" . exp1:en-hello_world-V a exp:Expression . exp1:en-hello_world-V exp:baseForm "hello world" . exp1:en-hello_world-V exp:catCode "V" . exp1:en-hello_world-V exp:hasOrigin _:_1886045197 . exp1:en-hello_world-V exp:langCode "en" . exp:baseForm a meta:expressionDataProperty . exp:catCode a meta:expressionDataProperty . exp:hasOrigin a meta:expressionObjectProperty . exp:langCode a meta:expressionDataProperty . lite:item lite:hasProperty "anything" . term1:en-hello_world-V_-_lite-item a term:Term . term1:en-hello_world-V_-_lite-item term:hasDesignation exp1:en-hello_world-V . term1:en-hello_world-V_-_lite-item term:hasReferent lite:item . term1:en-hello_world-V_-_lite-item term:status "proposal" . term:hasDesignation a meta:termObjectProperty . term:status a meta:termDataProperty .

Virtual ontologies

TF profile conversion on demand can be supported by the TF services as follows. The sequence of sparql queries and rewrite steps that stepwise bring a TF Lite ontology into the TF Full normal form can also be carried out through the TF services and coded in the form of TF query URLs. There are at least two ways of doing such virtual conversion:

  • use query import where the query generates the TF ontology from the skos ontology at load time
  • use bridge ontology and reasoner to generate TF ontology triples from skos triples at query time

A complete conversion procedure can be coded in the form of an ontology which imports the results of a number of queries on the source ontologies and intermediate ontologies based on the same principle. At the leaves of the imports tree are instances of the ontologies to convert. The conversion ontology represents the conversion process and its result at the same time. The conversion gets in effect rerun every time the ontology is loaded. Since the process can take a while, TF caching is useful to avoid actually doing the conversion at every load. But by proper version control on the cache, the conversion can be set up to run automatically every time the input ontologies change. In this approach, conversion ceases to be a separate offline affair, and becomes an active part of the TF distributed repository access system. The conversion is not just a once-off process but a type of ontology, a virtual TF ontology.

An even looser coupling to TF can be maintained with the help of bridge ontologies. In this case, the third party ontology is not converted at all. Instead, TF queries are carried out against it by bridging its concepts to TF with a separate bridge ontology. An example is the bridging of the BioCaster epidemic ontology to TF with the bridge ontology biobridge.ttl .

The BioCaster ontology is an example of a Legacy profile term ontology. It distinguishes concepts and terms, but not expressions. Here is an example of how terms look in it:

@prefix : <http://biocaster.nii.ac.jp/biocaster#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix p1: <http://biocaster.nii.ac.jp/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . :Country_235 rdf:type :Country ; :ISOCode "VN"^^xsd:string ; :hasLink :Wikipedia_04977 , :Wikipedia_04976 ; :label "Vietnam"^^xsd:string ; :synonymTerm :vietnameseTerm_1188 . :vietnameseTerm_1188 rdf:type :vietnameseTerm ; :hasRootTerm :Country_235 ; :label "Việt nam"^^xsd:string .

The English term for a country is of type Country and has an English language label. It is linked to its synonyms in other languages which are linked back to it with the hasRootTerm property. Concepts, terms, and expressions are not separated as distinct entities. Biocaster can be construed as a special case of a TF Full ontology where terms are self-referential and self-designative. These identifications can be spelled out as the following bridge ontology:

term:selfReferent rdfs:subPropertyOf term:hasReferent ; rdf:type owl:ReflexiveProperty . :hasRootTerm rdfs:subPropertyOf term:selfReferent . :Country rdfs:subClassOf :englishTerm . :englishTerm rdf:type term:Term ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:hasValue "en"^^xsd:string ; owl:onProperty exp:langCode ] . term:hasDesignation rdf:type owl:ReflexiveProperty . :label rdfs:subPropertyOf exp:baseForm .

The first two axioms say that root terms are self referential. The next pair express that country names are in English. The last two axioms say terms are designations. Using this bridge ontology, a reasoner is able to parse BioCaster on the fly as a TF Full ontology.

There is a conversion script io/script/convert4tf similar to the localization script io/script/localize for carrying out on-demand conversions like

convert4tf -t bio:Country -e Stacked -X biobridge.ttl biocaster.ttl

One possibility worth considering is to link ISOcat to TF as another term repository through its web service interface, as a collection of virtual TF ontologies. This idea could also work with the ONKI ontology library. However, as of fall 2011, ONKI is not planning to provide a SPARQL entry point (Osma Suominen, p.c.)

Repositories

We now define what we mean by a TermFactory site and a TF repository. A TF site is a web site identified by a URL, maintained by some organization that comprises one or more TF back end repositories, one or more TF front end platforms/tools, and some maintenance staff. A TF repository consists typically of one or more web servers with file space, one or more persistent-ontology databases (for now, usually relational databases), and TermFactory web services. TF repositories can communicate with one another through web services without the mediation of TF front ends.

A TermFactory system of repositories consists of RDF/OWL databases at different sites, connected to one another through a web service protocol. In the following figure, the green cans represent each a site with a TF repository. The arrows represent TF web service connections . The repositories are laid out in a tree to suggest inheritance relationships between the terminology domains that each repository "owns". The situation need not be that simple in practice, sites may also reciprocate (the blue arrow), and all repositories can connect through the protocol. As the image suggests, different (language) regions can take responsibility within each sector for their vernacular terms and reciprocate with their peers for multilinguality.

Show/hide TF repository network

TF repository network

Each database can store several term collections as OWL ontologies. Each terminology collection is named by a common URI (universal resource identifier) and an associated namespace prefix, for instance, prefix tfs abbreviates URI http://tfs.cc and goes with the TermFactory ontology schema. Each site stores those ontologies which it owns and manages, plus it can cache or mirror ontologies which are owned by some other site.

(version 0.2) The current TermFactory ontology resides in http://grapson.com/TF/owl/TermFactory.owl

The TermFactory schema TFS.owl resides in http://tfs.cc/owl/TFS.owl

Persistent repositories

A persistent RDF/OWL database can be a triple store or a relational database (MySQL, for instance) which stores OWL data (as ontology triples or some more storage-efficient form). An OWL database manager built on top of the relational database knows how to store OWL ontologies into the database and retrieve them or answer queries about them. The TF web service layer knows how to send queries between the repositories and relay the answers back to the client.

(version 0.1) Jena code written for creating and accessing persistent TF ontology databases on a MySQL server.

(version 1.0) Pellet4TF can load persistent ontologies from/to repositories resident on MYSQL servers.

(version 1.2) QueryService is able to use DB cache for TF queries. See TF location mappings .

TF includes a Jena TDB native triple store database that maintains RDF graphs in indexed data files on the file system. It can be used by the TermFactory utilities and services as temporary storage. By default, TermFactory queries and edits database models in place instead of reading them in memory. Query results get cached in database if the wait flag is set. A read only ontology gets cached in a database first time it is edited if the database cache flag is set. The default is to have cacheFS flag on and cacheDB flag off. The default assumption is that retry location mappings are set up so that a database cached version has precedence, so that editing continues where it stopped.

Revision control

Revision control (also known as version control, source control or (source) code management (SCM)) is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may be changing the same files. Changes are usually identified by a number or letter code, termed the "revision number", "revision level", or simply "revision". For example, an initial set of files is "revision 1". When the first change is made, the resulting set is "revision 2", and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged.

Version control systems (VCS) are most commonly stand-alone applications, but revision control is also embedded in various types of software like word processors (e.g. Microsoft Word, OpenOffice.org Writer, KOffice, Pages, Google Docs), spreadsheets (e.g. OpenOffice.org Calc, Google Spreadsheets, Microsoft Excel), and in various content management systems. Integrated revision control is a key feature of wiki software packages such as MediaWiki, DokuWiki, TWiki, etc. In wikis, revision control allows for the ability to revert a page to a previous revision, which is critical for allowing editors to track each other's edits, correct mistakes, and defend public wikis against vandalism and spam.

It may not be worth the trouble to manage TF ontologies as fully dynamic ontology databases with real time shared access, record level locking, etc. Terminology information needs a slower process of collaborative editing on something like a wiki platform (which of course can be backed up or under revision control). Once approved, terminology is moved to a read only section of the database, where it gets updated through batch updates.

OWL revision control literature contains many proposals to log ontology change history explicitly and in full detail as OWL annotations or meta ontologies ( Redmond et al. ). Versioning systems like parsia.com Redmond et al. paper do not use a syntactic or semantic approach, but a pragmatic one, as relations between versions are defined by a history of user actions. An item is related to its predecessors through a history of editing change by a user. We avoid going down this route. In our view, ontology versions should be compared on their current merits, whatever their editing history. Instead, we are thinking in terms of two extremes:

  1. syntactic: standard serialisation of ontologies plus existing third party source revision control systems like svn.
  2. semantic: a facility to run semantic diffs using a reasoner. Horrock et al. (ref.) show how to test entailment in OWL by doing tabular proof on a source ontology and a complement of a target ontology.

If the tableau construction succeeds, the complement of the closure contains facts not entailed by the source. The semantic diff of two ontologies is their symmetric difference (exclusive or). The first extreme documents editing changes, the second extreme semantic changes.

For TF revision control, there are pre-existing tools for versioning document, memory, and database representations of ontologies. Ontologies as web documents have and surely need both version information and version control. Version information can be provided by OWL annotation properties. Ontology tools like editors allow the user to manage working versions of the ontology under offline development. Persistent ontology revision control, which allows retrieving and comparing to older versions of the same ontology, is best handled with dedicated version control systems outside of the ontology proper. For databases (like mySQL there are ways to back up and recover (restore) snapshots of the database content either as a whole or incrementally (see HowtoForge ). Web content management platforms and Wikis have their own version control solutions. (Cf. a discussion of version control for web content. .

During the TF project, the software and ontologies are being kept under the Subversion revision control system. Subversion has extensions for revision management through websites or local filesystems (e.g. Tortoise SVN for Windows, eSVN for Linux).

OWL documentation discusses version annotation per ontology, but notes that the owl:versionInfo property can be used to annotate classes and properties as well as ontologies. With TF versioning a question of grain of such annotations arises (cf. source indications ). Versioning per ontology may be too coarse, versioning on every triple is too fine. TF conventions for annotation property assignment and property inheritance are called for, with rules for conflict resolution for when a triple inherits more than one source or version indication. In XML based entries like TBX, administrative information is scoped by the term entry tree. Semantic Web reasoners enable stating property inheritance using some rule language. Two main approaches present themselves for TF.

  • scoping by query. An entry query retrieves annotation triples relevant for the entry. The scope of the triples is implicitly defined by the structure of the entry.
  • inheritance by rule. Annotations for a triple are inferred by a reasoner using explicit rules .

The approaches can be combined. An entry query collects the annotations, and rules distribute them to the component triples.

OWL file formats

A web ontology (RDF or OWL) is a many-dimensional graph. As such, it can be written to a linear character stream, or serialised, in many ways, and many ways have been proposed and implemented to date. Here is a table with implementation notes. Some layouts are essentially line-oriented, triple-per-line/element, others build some tree-like groupings. For triple-oriented notations, see Venn diagram here .

name extension description pattern comments
n-triple .n3 line oriented textual layout subject predicate object . special case of n3
TF3 .tf3 alphabetically sorted triple layout special case of turtle
Turtle .ttl tree oriented textual layout subject ( predicate ( object , )* ; )* . handles uri prefixes and rdf lists
Notation 3 .n3 a logic language over triples subject predicate object .
RDF/XML .rdf, .owl line oriented xml layout
RDF/XML-ABBREV .rdf, .owl tree oriented xml layout special case of RDF/XML
OWL Functional syntax .tf3 triple oriented function notation for OWL 2.0 predicate(subject,object)
OWL XML .xml statement quad oriented layout for OWL 2.0 (statement-type predicate? subject object)

Normal forms

This section considers normal forms for TF ontology documents. A normal form is a unique choice among equivalent representations. Reduction to normal form by term rewriting is what many reasoners in effect do. Tabular reasoners virtually reduce a model to disjunctive normal form (DNF). Resolution reasoners reduce to conjunctive normal form (CNF). Normal forms for many different logics exist, which can be made unique by sorting terms. Reduction to normal form reduces equivalence comparison to textual identity. More generally, normalization converts semantic reasoning to syntactic processing. Runtime is saved at the cost of offline compilation and storage.

Carroll/Stickler 2004 is an early proposal for an xml normal form for rdf triples. Dau 2006 discusses normal forms for rdf graphs. He defines two extreme normal forms for rdf graphs as concerns the proportion of nodes to triples. One extreme is a normal form where each fact is represented as a separate triple. The other extreme is a representation where every node appears just once. The duality is similar to that between nested tree (graph or matrix) and nonnested (path or mrs) representations of feature structures or Turtle files.

What about OWL? OWL ontologies are theories in DL, which has a a sizable theory of equivalence. There are also a number of proposed CNF normal forms for some description logics ( Hitzler/Eberhart 2007 , Bienvenu 2008 ).

The most promising semantic normal form for TF may be the prime implicant normal form or its dual, the prime implicate normal form. The prime implicant normal form is basically a pruned syntactic version of a tableau (Hintikka model set) construction. (Dually, the prime implicate normal form is a pruned clausal normal form.) Bienvenu 2008 defines PINF for the modal logic K, but it should be extensible to other modal logics with finite model property (Bull and Segerberg ref), including KB, KT and K4. The construction is not efficient (possibly exponential in time and space), but once in (suitably sorted) PINF, two ontologies can be compared syntactically for semantic differences.

Entry normal form

Another notion of normal form is the form used in TF to display entries. This section studies ways to bring a TF ontology into a normal form from which terminological entries can be extracted without involving a reasoner. The idea is to apply reasoning offline to bring an ontology into an equivalent form from which an entry structure can be produced by just filtering triples. The TF schema allows many equivalent traversals of the model graph. For one thing, every property has an inverse. This redundancy can be exploited in conversion. A conversion script can choose which way it traverses a source document to produce a sufficient set of axioms to generate the rest. A related notion of normal form, analogous to relational database normal forms, is provided by TF profile conversion .

This also holds for the TF entry writer. The writer templates decide the traversal, and a schema can bridge third party content to TFS. The current TF2XHTML writer uses the Pellet reasoner to reason about the schema, but only does type level (TBox) reasoning on the schema model. For producing entries from an ontology, it would be faster not to have to call a general purpose reasoner at runtime to fill out missing structure. Reasoners are unfocused and slow for the purpose. Instead, we may write a set of special purpose reasoning steps to apply offline to normalise an ontology for term extraction. In practice, such special purpose reasoning can also happen using sparql scripts. Scripts invp.sparql and dirp.sparql generate from an ontology inverse property triples corresponding to direct property triples in it, and vice versa.

TF triple format

For syntactic revision control, we want to specify a textual normal form for a TF ontology and maintain it using existing text-based versioning systems. The idea is to use svn or some such tool to run diffs on triple files, and visualise the differences in a graph representation of the ontology (say, by colouring).

There is no unique serialisation of RDF/OWL that would allow reliable textual comparison of two ontology documents. OWL APIs and editors based on them do not in general guarantee constant printout either, probably because the serialisation depends on implementation and runtime dependent factors (like hashmaps). (Note: more recent editors show progress on this count. ) TF undertakes to produce at least some normal form. The simplest one may be a sorted triple file. To obtain a unique XML/RDF serialisation, a graph traversal order needs to be fixed, plus the grain of the units to compare. A fine grain still readable to human eye should be preferred for debugging.

We start from the triple representation of an ontology, sort and rewrite it into a unique normal form. Sorting triples is not enough, we need some way to compare blank nodes between versions. Ontology read/write routines normally rename blanks in order to avoid accidental capture of anonymous variables across ontologies. In versioning, we want to match blank nodes between versions if possible. Jena blank node ID's are long hex numbers (uuid's) which have a good chance of staying distinct even though they have not got the distinctness guarantee of urls.

We first considered having a Jena triple reader/writer pair that writes sorted triple files without renaming blank nodes. That was not a good idea, for other tools (quite justifiably) don't respect blank identity. Identifying blank nodes between versions goes against the grain of RDF for blanks are supposed to be anonymous. Adding each blank a tf:nodeID property (as in [ tf:nodeID "1" ] might work in principle, but gives a messy look. Also some ontology tools may not handle well such extra labels. A third approach was to invent some canonical numbering of blanks. To test the idea, we first wrote a text based script which parses a triple file, sorts it so that blanks are treated as equal, and then renames the blanks in the sort order with running number. The experimental script sort.perl produced identical results from two triple file printouts of TFS.owl

Carroll [Car2003] presents an algorithm for generating a canonical names for blank nodes in order to obtain a canonical ordering of the triples of a (possibly slightly modified) RDF graph. TF3 uses a variant of the same idea.

The TF3 or TF triple normal form of a TF ontology aims to minimise free variation between versions, so that standard textual diffs and their visualisations can focus on relevant differences. The TF3 format is a special case of the N3 triple format where blank IDs are standardised using Carroll's idea, statements are written one per line and sorted alphabetically. The newline character in multiline literals is escaped as "\n". N3 in turn is a special case of Turtle format, so that TF3 files can be read by a Turtle format reader. The older rdf triple file format N-Triples does not do prefixes, so it takes no notice of prefixes on output and doesn't provide any on input. N-Triples is not used in TF.

The current TF3 writer uses blank node IDs that encode the blanks' position in a standard serialisation of the rdf model. Then it writes the statements sorted in alphabetic order. For test cases at least, this seems to work as a textual normal form that allows writing an ontology into a TF3 file, back to RDF/XML and again into the same identical TF3 file. This normal form is close to the RDF triple representation of an OWL ontology, which can be a plus and a minus. See the rewrite utility for details.

With TF3 in hand, we can pinpoint version changes made to an ontology (on the RDF graph level) textually using file comparison utilities like diff . The differences can now also be visualised graphically with the homespun RDF graph visualizer TFVisu .

The TF3 normal form contends with the fact that memory representations of the same RDF graph can have alternative serialisations, due to implementation-dependent grouping and sorting of statements and renaming of blank nodes. The TF2XHTML writer in effect defines another such normal form. All it would take to actually print it out as RDF is an alternative writer to prints out the triples sorted by TF2XHTMLWriter in Turtle format instead of XHTML.

TF entry diff

The XHTML entry writer's original and edits parameters provide another way of displaying version differences. The following command does a diff of two entries. The edit4tf add operator merges the original entry for China (ChinaOld.owl) with the edited version (ChinaNew.owl). The result of the merge is displayed as an XHTML entry with the deleted (original) content stricken out and the new (active) content underlined.

edit4tf XHTML original=ChinaNew.owl edits=ChinaOld.owl ChinaOld.owl ChinaNew.owl add
Show/hide TF entry diff

Date and user

>

The XHTML writer records the date and time of the write of a TF document in a meta element in the header of the document in the XML date format . This information is preserved in conversion of the document to other TF formats as a date-valued datatype property meta:date .

When a user submits an edit, the MediaWiki plugin sends the currently logged in user name to the edit service, which records it in the XHTML header as a meta element. This information is preserved in conversion of the document to other TF formats as datatype property meta:user . Other services or operations do not change the last recorded user.

Efficiency

This section surveys ways and means to improve runtime efficiency of TF installations.

TF caching

Caching

Välitalletus

Ontology reasoning and querying is quite sensitive to repository size. In TF, repositories can get big, but queries and imports need to remain manageable size. This suggests a modular "small models" approach. There are several possible ways of caching small TF models quick retrieval. Different types of cache can be pre-populated on a regular basis from the base ontologies using a collection of popular queries.

  • Cache query result sets and edits in TF repository databases
  • Cache ontology documents in repository directories
  • Cache entry html pages on a content management system
  • Ontologioiden/kyselytulosten välitalletus tietokantaan
  • Termien välitalletus tiedostojärjestelmän hakemistorakenteeseen
  • Termitietueiden, editointien ja keskustelujen välitalletus TF-alustoille

Show/hide TF caching

TF caching
DB cache

A TF location mapping retry query takes optional parameters queryDB and cacheDB with boolean values ( true or false ). The value queryDB=true lets the retry utility try default database location mappings (those prefixed with rdb+ ) in etc/location-mapping.n3 . With queryDB=false (the default), the retry facility skips any rdb+ prefixed location mappings. Note that queryDB only restricts existing location mappings, it does not create any mappings.

The value cacheDB=true makes the retry facility save the result set of the retry query on the default DB database using the query uri as modelID. With the default value cacheDB=false the result of a query is not stored in the database. During editing, it makes sense to have both flags on, so that editing changes are immediately available to subsequent queries.

These options implement a database cache for TF query result sets. Command

retry lsRDB
shows what is currently in the database and
retry notry rdb+file:Place.owl
gets the database cached version. To put Place.owl in the database use
retry notry file:Place.owl cacheDB=true
To remove a database file from the cache use
retry clearDB rdb+file:Place.owl notry
and to empty the whole database use
retry cleanDB
Directory index

The TF repository URI convention allow storing ontology subsets (for instance, individual or collections of entries) indexed under filename paths that reflect the position of the entry or collection in some alternative TF class taxonomy or other. Such a directory index can share structure through symbolic links, so that the same item(s) can be indexed under many alternative paths. One example is an index based on the concept URL as follows.

Building a URL directory index

An example of creating a directory index of precomputed entries is the following. The work is done by perl script urlindex in io/script :

pellet query -e ARQ -q classes.sparql ../owl/TFS.owl > TFS.uris
lists classes in TFS.owl
sudo -E $TF_HOME/io/script/urlindex TFS.uris
creates and populates the index

The resulting directory tree is shown below.

Show/hide TF url index

For another example, an entry for China (the country) could be indexed by subject field under URI http://tfs.cc/ont/Domain/Geography/Country/China along with other entries for countries in the same folder. A (cross) classification can be multiply indexed by different sort orders and a deep one by different degrees of granularity using alternative directory paths, according to need and size of the collections. A third example is an indexing by subclass structure as in the example below.

  • Human
    • Boy
    • Girl
    • Man
    • Woman
    • Female
      • Girl
      • Woman
      • Adult
        • Woman
      • Young
        • Girl
    • Male
      • Man
      • Boy
      • Adult
        • Man
      • Young
        • Boy
    • Adult
      • Man
      • Woman
      • Female
        • Woman
      • Male
        • Man
    • Young
      • Boy
      • Girl
      • Female
        • Girl
    • Male
      • Boy
Building a class directory index

An example of building a directory index based on subclass hierarchy is the following. The work is done by three perl scripts in io/script :

classindex
builds a directory structure matching subclass hierarchy in /var/www under owl:Thing
classquery
runs TF DESCRIBE queries for all subclasses in the hierarchy
classlink
creates links from the directories in the index to the matching entries in owl:Thing

The steps to create the index structure are as follows.

pellet4tf classify ../owl/TFS.owl > TFS.classify
use pellet to classify the ontology
classindex TFS.classify > TFS.classindex
create a shell command file to create the directory tree
sudo sh TFS.classindex
pellet4tf query -e ARQ -q classes.sparql > TFS.classes
use pellet4tf to list the classes in the ontology
sudo -E $TF_HOME/io/script/classquery TFS.classes
use pellet4tf to create entries for all classes at the root of the directory tree
sudo -E $TF_HOME/io/script/classlink
link the entries at root to the directory tree

The resulting directory tree is shown below.

Show/hide TF class index
CMS cache

A content management system like Drupal can hold the results of common queries (for instance, the established entries) in html rendering for browsing in its own story database.

Tomcat native library
http://blog.janjonas.net/2010-07-29/ubuntu-10_04-tomcat_6-improve-preformance-apache-tomcat-native-library

To improve the performance of the Apache Tomcat Server you could use the Apache Tomcat Native library. In Ubuntu, to install the library just execute:

# sudo aptitude install libtcnative-1

and add the following line to the file $CATALINA_HOME/bin/setenv.sh (create the file if it does not exist):

export LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH

When the library is running you will find the following message in $CATALINA_HOME/logs/catalina.out:

INFO: Loaded APR based Apache Tomcat Native library 1.1.20.

Without the Apache Tomcat Native library you will find a message like:

INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: ...

Indexing

Indexes provide faster ways of retrieving preprocessed and classified query results than running SPARQL queries against full ontologies. Again, there are many alternatives:

  • TF repository database index
  • TF repository directory
  • Separate indexes or registries
DB index

A TF database holds result sets of TF URI queries as Jena models using the URI of the query as model ID. One can populate the database with pre-fetched queries and use the database's indexing capabilities to retrieve relevant items from the db.

Repository directory index

A TF repository directory (the public section of the file system of a repository web server) forms an index that can be searched for matching documents. As a simple expedient, The QueryService repos parameter accepts directory URIs ending in solidus (slash) path/ . Such a directory URI is resolved to the URI path/repos.php , which is expected to return a newline-separated list of URIs. This list of URIs is joined to the query dataset file list. (This directory resolution process does not apply recursively at present.)

We may propose the convention that a TF site's own public ontologies, those which it owns and serves to other sites, are at (accessible from) the repository's root directory / . Physically, they may be where they will, but they are linked to (redirected from) the site server root. They are a subset of the ontologies listed in its ont-policy.rdf . The site's ont-policy.rdf may list further ontologies owned by the site which are not public. A site's root repository can be specified as dataset with the site root URI http://localhost:/ (or just / ). (This assumes that the root's repos.php produces the appropriate listing.)

TF webdav indexing

TF IndexService provides a way of copying remote (or local) entries or ontologies on a local (or remote) WebDAV web directory.

Separate index

TF repositories can be indexed with generic text indexing tools (Google, Lucene).

Indexing and ontology factoring

OWL does not impose any built in constraints on how a larger ontology should be split into smaller parts. Good TF practice may dictate some conventions. One natural division is between language independent concept ontologies (which may come from a third party source) and language dependent term ontologies, multilingual or language specific. This idea was applied already in the 4M ontology, which builds a hierarchy of increasingly specific domain ontologies starting from a set of common concepts through a series of sector specific ontologies (networking and diesel engines in 4M) to ontologies of company specific terms or product names (Windows XP and Wärtsilä ship diesels). Each language-independent ontology has corresponding term ontologies associated to it. The conversion of the BioCaster disease ontology to TF splits Biocaster ontology into a number of domain specific ontologies: diseases, locations, etc., plus a set of term ontologies in different languages to match the domain ontologies.

In TF Full, a further layer of domain independent expression ontologies suggests itself. An expression ontology might split into dialect independent and dialect specific parts, for instance, for English or Brasilian, or by expression type or part of speech.

The division of content by such natural dividing lines should allow focusing search for relevant content to those sub-ontologies that are most likely to contain it. A consistent naming discipline for the URLs of such sub-ontologies could obviate or alleviate the need of a separate index (catalog or registry) to tell where to look for resources.

For TF purposes, we want a naming discipline. What we would like to accomplish is to make an URI like http://tfs.cc/ont0/ctryCode resolve to a description of just that particular resource. Typically, this URL points to the result set of a DESCRIBE query on the resource.

Such entry documents form collections in a repository directory tree. The home collection of http://tfs.cc/ont0/ctryCode is the collection at http://tfs.cc/ont0/ . The URL http://tfs.cc/ont0/ctryCode should fetch a DESCRIBE result (entry) for concept ctryCode from that collection. This can be accomplished in different ways. The simplest solution is to store an entry at the location. Another way is to use server URL rewriting . A third way is to let http://tfs.cc/ont0/ctryCode point to a web directory. The directory holds cached versions of the entry in different formats. The directory index (index.php) returns one of the files in the directory.

The URL http://tfs.cc/ont/Domain/Geography/ is an example of an ontology URL that collects entries for geographical concepts. This indexing URIs uses TF domain hierarchy to define a path for indexing entry documents. Entry http://tfs.cc/ont/Place could be found indexed in this collection as http://tfs.cc/ont/Domain/Geography/Place . The advantage of such indexing is that concepts can be found with minimum of machinery. A drawback is that we need machinery to make sure that changes in the indexing ontology are reflected in the indexing URIs. For automating the indexing, see IndexService .

A naming convention is then one possible indexing into the collection. It is too narrow to fix on just one index here. We should keep in mind that a resource can have many URIs pointing to it. One approach is to develop a naming convention based on the [scheme:][//authority][path][?query][#fragment] URI structure with three zones:

  • repository ID, which includes the (scheme if relevant), authority and a (possibly empty) prefix of the path
  • classification index, which includes a redundant (possibly empty) middle zone of the path
  • subset ID, which includes the last ("filename") part of the path and query string (if any), or
  • resource name, which is the fragment.

If slash vocabulary is used, the resource name is the last path element.

An individual resource like a concept, term, or expression, is named (in the sense of a given or local name) by the fragment. Each resource has a home site identified by the host and authority of a namespace. It is the only one that has authority (to authorise others) to create and delete resources in that namespace. A resource can be identified by many URIs in the home namespace so it can have many aliases with different index in its home repository.

The convention given allows for parts in a resource URI which are sufficient but not necessary for identification, i.e. any ID of form [repoID][indexZone][subsetID][fragment] points to the same resource independent of what the index zone contains. The index zone can then constitute alternative directory trees for the repository ontologies/subsets that allow pre-selecting parts of the ontology according to different criteria. Different paths can also point to the same contents. The naming convention then provides another indexing device to keep down the complexity of ontology queries.

Mirroring of ontology documents

It makes sense for web addressing orthodoxy that a site's (say tfs.cc) official own read only files are accessible directly at the site's document root, so that for tfs.cc for instance, the locations of its own ontology files are at

  • /owl/... for ontology files
  • /ont/... for concept entry files, etcetera

If a site mirrors another site, then read only copies of a mirrored site, say tfs.cc mirroring grapson.com, could be accessible as tfs.cc/grapson.com/...

A site may want to mirror edited but not yet official published versions of its own and other sites' documents in a web writable dav directory for middle term saving. These collections could be addressed thus:

  • tfs.cc/dav/localhost/... for own documents
  • tfs.cc/dav/grapson.com/... for other site's documents

This proposal creates an asymmetry because localhost is not on the path for a site's own official documents. We don't need or want tfs.cc/localhost/owl, because tfs.cc is the localhost. For the official names of resources, the shorter the address the better. But we prefer tfs.cc/dav/localhost/owl/... for the dav versions of a site's own files, instead of the shorter tfs:cc/dav/owl/... There are two (admittedly not completely knockout) arguments for this proposal. First, the symmetry avoids parsing errors, since the directory structure is not dependent on the naming. Second, it avoids omission errors when the local dav collection is to be mirrored elsewhere.

Content negotiation for entries

A TF resource URI like http://tfs.cc/ont0/ctryCode might get described by a TF DESCRIBE query http://localhost/TermFactory/query?uri=ont0:ctryCode . Say we want to store a copy this description ("entry") in the grapson.com site filesystem so it can be retrieved by its URL. One convention is to save the entry as a file in one or more of the TF formats in the resource directory pointed at by http://grapson/com/tfs.cc/ont/ctryCode/ with the local name plus appropriate suffix, say http://grapson.com/tfs.cc/ont/ctryCode/ctryCode.xhtml . The place to code this (or some alternative) content negotiation strategy is in location mappings .

A TF site may want to mirror contents of another site in its own document index and use TF location mappings to redirect requests for these documents to the local copy. Retrieval is faster but some mechanism is needed to keep the copies up to date.

Cache updates

According to the design, TF repositories may cache copies of ontology models in a database. This raises the question how to tell upstream servers to invalidate cache items when an ontology has changed.

A partial solution for this is provided by location mappings. Location map a given persistent url prefix, say, http://tfs.cc/owl/TFS.owl , first to a versioned url like http://tfs.cc/owl/2.0/TFS . When there is a version change, the persistent url is mapped to a new version number, say, http://tfs.cc/owl/2.1/TFS.owl . Queries for the persistent url after the remapping will be looking up items whose urls are not in the cache (DB items are cached against the last url in the mapping), so they will be queried anew and the new version cached in the DB against the new version number. The older version stays in the cache, but it is no longer fetched, and nobody is the wiser.

However, an ontology query (for instance a TF DESCRIBE query) can also consult several ontologies and cache the result. If any one of the consulted ontologies changes, the cached result becomes stale. One way of solving this last problem is for TF DESCRIBE queries to collect owl:versionInfo triples from each consulted ontology with the cached model to know which ontology versions it has imported, and compare that list to version info associated to ont-policy file location mappings of the imported ontologies in the appropriate TF server(s). Thie following details how.

TF extends the syntax of Jena ont-policy.rdf files as shown in the following excerpt.

<OntologySpec> <!-- current version of TFS.owl --> <publicURI rdf:resource="http://tfs.cc/owl/TFS.owl" /> <owl:versionInfo rdf:datatype="&xsd;string">TF-Schema version 0.1 21.01.2010</owl:versionInfo> <language rdf:resource="http://www.w3.org/2002/07/owl" /> </OntologySpec>

The new feature is the owl:versionInfo element which identifies the current version of the public URI on this server. (For now, it does not matter just what the versionInfo element says; the current checker only tests for string equality. So far, there is no provision for there being more than one versionInfo element per URI, or more than one ontology specification for a given uri in the ont-policy file.) The default TF DESCRIBE queries etc/tfi.sparql and etc/tfc.sparql are set up to collect versionInfo elements to the query result. A model cached in the database might begin as follows:

<rdf:RDF xml:base="http://tfs.cc/owl/TFS.owl" > <owl:Ontology rdf:about=""> <owl:versionInfo rdf:datatype="&xsd;string">version 0.0 31.10.2008</owl:versionInfo> </owl:Ontology> ...

When TF option TF_VERSIONINFO is true, the TF location mapping retry facility compares the owl:versionInfo triples found in the database model to those specified in ont-policy.rdf . If some URI occurs both in the cached model and in the ont-policy file associated to different owl:versionInfo values, retry warns about this with a message like

WARN ... TF versionInfo clash reading http://tfs.cc/owl/TFS.owl at URI http://tfs.cc/owl/IMP.owl: ont-policy info 'version 0.1 21.01.2010^^http://www.w3.org/2001/XMLSchema#string' vs. cached info 'version 0.0 31.10.2008^^http://www.w3.org/2001/XMLSchema#string'

The above solutions apply to the case where the database and ontologies are on the same repository server. However, our server may cache items imported from a version of some resource on another server. Our user may get an old version of the other resource whose version info is not on our ont-policy. To allow cross-server checks, we need to make (some version of) the ont-policy file of each server accessible from the web.

Tools

TF terminologies are maintained with the help of various user interfaces and tools. These include TermFactory specific tools, collaborative terminology platforms with TF plugins, and third party professional ontology tools.

TF tools

TF tools

The TermFactory specific tools include

retry
locates ontologies
rewrite
changes URIs and i/o formats
pellet4tf
does ontology queries
edit4tf
does ontology editing
index4tf
copies ontologies across locations
TF tool workflows

TF tool workflows

This slide shows how the different tools play together. Retry locates entries and documents in the TF cloud. Rewrite converts between formats and does refactoring of names. Pellet4TF is the query engine and reasoner. Edit4TF supports ontology editing. Index4TF handles cross copying of entries and documents to TermFactory repositories. A clockwise roundtrip on the perimeter of the figure traces the revisioning cycle of an ontology/entry through TF.

Show/hide TF tools

TF Tools

Retry utility

com.grapson.tf.rev.jena.Retry.java is a java class for applying TF location mappings. The TF ontology model class OntModel4TF reader is set up to use Retry. There is a command line wrapper script retry in io/script . This utility helps testing a site's location mapping and caching policy. Options as shown below.

retry
<URL> get URL (or other identifier) from location mappings
notry no redirection (default false)
(query|cache)(DB|FS)=true|false look or save in remote database or local file system database
lmap=<URI>? lmaps=<URI>* use given location map, add given location maps
url | (readAll|writeAll)=(true|false) | editAll return url only, read all triples, write all triples, include reaOnly triples from xhtml
<format> output format (options as in rewrite utility)
template=<URI>
root=<URI>
schema=<URI>
active=<URI>
original=<URI>
locals=<URI>
lang=<ISO langCode>
links=<URI>
rw=deblank|reblank|relabel...
verbose
file=true|false
list(DB|FS) list models in remote or local database
<regexp> whose modelID matches regexp (without regexp list all)
verbose
put(DB|FS) put model to remote or local database
<URL> URL of model to put
<modelID=<modelID>>? modelID if different from URL
verbose
clear(DB|FS) remove models from database or filesystem
<regexp> whose modelID matches regexp (without regexp do nothing)
verbose
cleanDB remove all models from database or filesystem
get(DB|FS) get model from database whose modelID is URI
<modelID> modelID of model to get
size(DB|FS) get size of model from database whose modelID is URI
<modelID> modelID of model to get

The retry utility database tasks manipulate the relational database. The corresponding tasks listFS, delRFS, clearFS, cleanFS, putFS, getFS manipulate the jena TDB native database held in the local filesystem. The location mapping task url tests location mappings to retrieve TF addresses. retry url foo just tries to locate and print the resolved location of foo. retry getDB foo only looks for foo in the database. It does the same as retry notry rdb+foo . Unless explicit format is given, the format of a cached model is guessed from the modelID. To get expected xhtml, xhtml entry write parameters may need to be supplied.

The url task tries to fetch the url using location mappings. Option file=true tells a command line utility to resolve subsequent schemeless file path parameters against the current working directory before passing them along. For instance, retry file=true foo says the same as retry file:$PWD/foo. . Here are some examples of usage:

retry queryDB=false http://tfs.cc/owl/TFS.owl apply mappings, skip database
retry clearDB http://tfs.cc/owl/TFS.owl remove source from cache
retry listDB file:\w{8}[-]\w{4}[-]\w{4}[-]\w{4}[-]\w{12} verbose list modelIDs matching UUID format
retry http://tfs.cc/owl/TFS.owl TURTLE locate TFS schema ontology and write it in TURTLE format
retry url http://tfs.cc/owl/TFS.owl find where TFS.owl gets mapped
retry url lmaps=wnlmaps.txt wn30:synset-1530s-noun-1 find where wn30:synset-1530s-noun-1 gets mapped with default location mappings plus location mapping files listed in file wnlmaps.txt
retry locals.json schema=TFS.owl TURTLE rw=relabel convert a TF localization json file back to a TF ontology

With url option, retry just locates the document and returns its location. With readAll=false retry downloads the document it finds. It downloads any kind of file, not just rdf models. If a RDF format is given and does not match the extension of the document at the location, retry tries to read the downloaded document as RDF and convert it into the requested format. With readAll=true, retry always tries to read the document into a RDF model and include its imports if any. In contrast, pellet4tf query -u always reads the requested document as RDF and includes imports unless option --ignore-imports is given.

Universal resource identifiers (URIs) may globally identify a resource (they point to just one thing in the world), but not all of them are related to universal resource locators (URLs) identifying a web source holding a description of that resource. By the Semantic Web addressing orthodoxy, the usual RDF resource URI naming convention (a URL plus a fragment identifier) suggests that a resource URI should be described in a document obtained from the given URL at a location pointed to by the fragment identifier. This in general is not practical. Web servers only serve complete URL documents, fragments are located at the client end. The whole ontology would have to be downloaded to access the description of just one URI in it.

This is why web addressing orthodoxy is not completely true of TF either. A TF URI is related to an URL describing it by the TF retry facility. Retry option uri2url shows the file URL of a given TF resource for given links and format. Bona fide URLs without fragment hash should stay the way they are, the rest get prefixed with links and suffixed with a file extension matching the given format. The url2url option tests where Retry location mapping finds the first available copy of a given TF file. Only availability of files is tested, the file is not downloaded.

Below is a schematic picture of URI and URL redirection in TF.

Show/hide TF URL redirection

TF url redirection

TF does URL redirection using Retry on jena location mappings. The web server (apache2 for instance) or the web application container (Tomcat for instance) can do their own URL rewriting.

Rewrite utility

com.grapson.tf.rev.jena.Rewrite.java is a java class for rewriting TF ontologies in different formats, and maintaining and changing TF term, expression, concept and individual URI's. It reads an ontology file and rewrites it adding labels to these TF objects and/or rewriting URIs of these objects. There is a shell script wrapper to Rewrite.java in io/script by name rewrite .

RDF/OWL node identifiers can be chosen arbitrarily. It may make sense to choose a fixed arbitrary identifier to a resource, so that all of its properties can change without losing its identity. An editor may show the user a node's label instead of (or alongside) its ID, or let the user tell the editor how to construct a mnemonic name for a node from its properties. But for manual inspection of an ontology in text form, or when transferring data across a variety of converters and editors, it may help to use less arbitrary, descriptive identifiers for entities. Numeric IDs are very easy to get wrong and the errors are hard to catch. TF tries to satisfy both conflicting needs with the Rewrite utility. Rewrite can be run with shell script io/script/rewrite . Query and edit utilities and services can apply one or more of certain labeled rewrite operations. Currently, the labeled operations are deblank , deblank , and relabel , which remove and restore blank nodes and create descriptive labels for anonymous TF resources, respectively.

Usage: rewrite raw | label | blanks | id=<int> remove replace create base=<URI> label=<URI> in=<URI> out=<URI> uuid urlencode | urldecode | tf3encode | tf3decode | normalize <source> ... <format> (readAll|writeAll)=(true|false) | editAll trim imports | queryFS=true|false queryDB=true|false cacheDB=true|false notry help format file=true|false verbose

Minimally, rewrite reads one or more TF files and prints a TF model in different formats. When there are more than one input file, it merges the contents of the files into one model. Help on formats is given with

rewrite help format Rewrite output formats: RDF/XML RDF/XML-ABBREV N3-PP TURTLE TF3 XHTML template=<URI> root=<URI> schema=<URI> active=<URI> original=<URI> edits=<URI> locals=<URI> lang=<ISO langcode> links=<URI> JSON TBX

For the XHTML format parameters, see the section on the XHTML format.

If a source consists of a TF filetype only (e.g. ".owl"), rewrite reads the source from standard input in the indicated syntax. It is then possible to use rewrite as a filter (e.g. cat test | rewrite .tf3 converts test from tf3 into rdf/xml).

Beyond that, rewrite allows rewriting resource URIs in various ways governed by the following option switches. Switch in= gives the prefix of URIs that come in (get factored out) in the rewrite. Switch label= gives the URI of a label (string or annotation) property used by rewrite. It defaults to meta:label . Switch base= gives the URI of a string property used for the lemma of the expression. It defaults to exp:baseForm . Option remove tells rewrite to remove any old label properties from the affected instances. If switch out= is given, new URIs that get created are factored into this namespace. If not, the default is the instance's original namespace. Only named resources are refactored. Task option blanks turns blanks into named resources.

Option create makes rewrite create new URIs for TF resources. What kind of URIs get created depend on the task, which can be one of id and label . With option create id , rewrite forms sequential numerical ids. With option create label , rewrite tries to form descriptive URIs for expressions and terms on the basis of their language, base form, and referent properties.

Option uuid makes rewrite turn the new URIs into uuids. The other encoding and decoding options try applying the corresponding character conversions on the URIs.

Option replace makes rewrite actually replace URIs of the affected resources. The URIs are taken from the label properties of the instances (if any) unless create is chosen. In that case, newly created URIs are used in the renaming. In either case, unless remove is specified, the old URI is saved as a label property of the renamed instance.

For expressions and terms, a descriptive URI is generated only when the entity has the requisite property values (langCode, catCode, and baseForm or romanisation in case of expressions; hasDesignation, hasReferent in the case of terms) and the namespace prefixes for the designation and referent namespaces are known.

The following command line provides descriptive URIs for blank node expressions and terms:

rewrite create label replace sourcefile > targetfile

Option readAll forces a full read of a RDF file

Option writeAll outputs all statements in the model and its imports.

Option editAll includes readonly triples from an xhtml file

Option trim trims off namespace prefixes for namespaces that are not used in the model.

Option imports prints out the names of the ontologies imported by the sources.

Command rewrite raw tf3encode source.tf3 tf3-encodes a triple file. This can be useful as a step in conversion.

Command rewrite raw normalize source.owl normalizes a file into unicode normal form C. It may be necessary to normalize incoming unicode for jena.

Protege 4 TURTLE reader (build 4.0.114) breaks on TURTLE string escape \" . Use triple quoted multiline strings?

Here is the round trip from an RDF/XML or TURTLE format ontology file to TF3 form and back.

  1. Read an RDF/XML or TURTLE ontology file with rewrite and write it out in TF3
    rewrite TFS.owl TF3 > TFS.ttl
  2. Read the TF3 ontology with rewrite and write it out in rdf/xml This form is equivalent to the original ontology.
    rewrite TFS.ttl > TFS.owl

Usage examples:

rewrite TFS.owl replace label urlencode TF3

The example command prints to standard output a relabeled version of TFS.owl in TermFactory triple format with entity URI's URL-encoded.

The following command line rewrites a turtle TF3 encoded ontology into RDF/XML decoding the TF3 encoded items.

rewrite file:puls-locations.ttl replace label tf3decode > puls-locations.owl

(version 1.9) The rewrite utility can also be run through the TF query services. An example query string is http://localhost:8080/TermFactory/query?uri=blank.owl&how=relabel . So far, only task relabel is covered for use in TF internal profile conversions See TF query string parameters .

The TF reasoner Pellet4TF

(version 1.0) The pellet reasoner command line tool Pellet has been adapted to carry out TF query and classify tasks. The new TF query engine is packaged in io/lib/tf-io.jar . It can be run using command line script io/script/pellet4tf.sh:

Pellet4TF classify adds the RDF/XML output format, and the ability to query the class tree round a given seed up and down to a given depth.

script/pellet4tf.sh help classify PelletClassify for TF: Classify a TF ontology [around a seed class] and display the hierarchy Usage: <pellet> classify [options] <file URI>... Argument description: --loader, -l (Jena | OWLAPI | KRSS) Use Jena, OWLAPI, or KRSS to load the ontology (Default: OWLAPI) --input-format (RDF/XML | N3 | N-TRIPLE) Format of the input file (valid only for the Jena loader). Default behaviour is to guess the input format based on the file extension. --output-format, -o (RDF/XML | text) Format of the output. (Default: text) --seed, -s (C) One class URI or local name to build a taxonomy for. Example: "Animal" --up, -u (I) Number of levels up from seed in classification tree. Default -1 (all) --down, -d (I) Number of levels down from seed in classification tree. Default -1 (all) --help, -h Print this message

Here is an example of using pellet4tf classify. TF prefixes are not supported (yet), so that full resource URI must be used as seed.

lcarlson@lhc:~/Data/CF/TF/io/ pellet4tf classify -u -1 -d -1 -s http://tfs.cc/ont/Language_industry ../owl/TFS.owl tree print http://tfs.cc/ont/Language_industry up -1 down -1 http://tfs.cc/ont/Language_industry http://tfs.cc/ont/Language_technology http://tfs.cc/ont/Multilingual_language_technology http://tfs.cc/ont/Linguistics http://tfs.cc/ont/Terminology http://tfs.cc/ont/Language_industry http://tfs.cc/meta/Domain http://tfs.cc/meta/Meta _TOP_ http://tfs.cc/sem/Place http://tfs.cc/sem/Role http://tfs.cc/sem/Meaning http://tfs.cc/meta/Object _TOP_ lcarlson@lhc:~/Data/CF/TF/io/

Pellet4TF query adds to pellet a facility for querying persistent ontologies. More concretely, this means that it can save ontology files into Jena relational databases and fetch ontologies from them. An ontology url is allowed to have the form of a Java database connection url, for example

jdbc:mysql://localhost/rdb&user=tfuser&url=http%3a%2f%2ftfs.cc%2fowl%2fTFS.owl

The name of the ontology is given in the query parameter named uri in the url of the database connection. This kind of a persistent-ontology uri can also be given as an redirection entry ( AltURL resource) in a Jena policy file ont-policy.rdf . PelletQuery4TF also understands the following kind of abbreviation:

rdb+http://tfs.cc/owl/TFS.owl
rdb+file://home/user/TF/owl/TFS.owl

Given a pseudo uri of this form, Pellet4TF tries to load the ontology uri after the rdb+ prefix from a pre-set database connection URL specified as option TF_DBCONNNECTION in pellet properties file etc/tf.properties . The default connection is jdbc:mysql://localhost/rdb .

Usage: <pellet> query [options] <file URI>... Argument description: --help, -h Print this message --verbose, -v Print full stack trace for errors. --config, -C (configuration file) Use the selected configuration file --query-file, -q (<file URL>) Read the SPARQL (or RDQL) query from the given file --url, -u (<URL>) URL to look up using retry. --uri, -U (<URI or QName>) URI to look up or describe. --describes, -D (<resource URIs or TF QNames>) List of resource URIs to describe. Pellet4TF query implements DESCRIBE query for TermFactory. --output-format, -o (Tabular | XML | JSON | TF3 | RDF/XML | RDF/XML-ABBREV | XHTML | TURTLE) Format of result set: Tabular through JSON for SELECT or ASK queries, rest for CONSTRUCT or DESCRIBE queries --matrix, -M (<file URL>) model to receive query results --minus, -m subtract query result from matrix --template, -T XHTML entry template --root, -R XHTML entry root --depth, -r DESCRIBE query recursion depth (Default: -1) --schema, -S XHTML schema ontology --axioms, -X Stacked engine axioms ontology --active, -A XHTML active ontology --original, -O XHTML original ontology --locals, -L XHTML localization ontology --lang, -l XHTML localization language --links, -H XHTML hyperlinks prefix --output-file, -f Output file. --output-encoding, -c Output encoding (UTF-8 or UTF-16). (Default: UTF-8) --query-format (SPARQL | ARQ | RDQL) The query format (Default: SPARQL) --input-format (RDF/XML | Turtle | N-Triples) Format of the input file (valid only for the Jena loader). Default behaviour is to guess the input format based on the file extension. --ignore-imports Ignore imported ontologies --display-query, -d Display the input query --sites-flag, -s Broadcast query to other sites --query-engine, -e (Pellet | ARQ | Mixed | SPARQL | Stacked) The query engine that will be used. Default behavior is to auto select the engine that can handle the given query with best performance. Pellet query engine is the typically fastest but cannot handle FILTER, OPTIONAL, UNION, DESCRIBE or named graphs. Mixed engine uses ARQ to handle SPARQL algebra and uses Pellet to answer Basic Graph Patterns (BGP) which can be expressed in SPARQL-DL. ARQ engine uses Pellet to answer single triple patterns and can handle queries that do not fit into SPARQL-DL. As a consequence SPARQL-DL extensions and complex class expressions encoded inside the SPARQL query are not supported. SPARQL is just plain Jena ARQ without pellet. Stacked engine first runs sparql to get a smaller model and then runs Pellet on that model plus a separate schema (--axioms). --bnode Treat bnodes in the query as undistinguished variables. Undistinguished variables can match individuals whose existence is inferred by the reasoner, e.g. due to a someValuesFrom restriction. This option has no effect if ARQ engine is selected. --timing, -t Print detailed timing information --queryFS query from FS cache --cacheFS save to FS cache --queryDB query from DB cache --cacheDB save to DB cache --notry, -n no location mapping --nowait, -w wait for results pellet4tf help query PelletQuery4TF: SPARQL-DL Query Engine for TermFactory Usage: <pellet> query [options] <file URI>... Argument description: --query-file, -q (<file URL>) Read the SPARQL (or RDQL) query from the given file --url, -u (<URL>) URL to look up using retry. --uri, -U (<URI or QName>) URI to look up or describe. --describe-resource, -D (<resource URIs or TF QNames>) List of resource URIs to describe. Pellet4TF query implements DESCRIBE query for TermFactory. --output-format, -o (Tabular | XML | JSON | TF3 | RDF/XML | RDF/XML-ABBREV | XHTML | TURTLE) Format of result set: Tabular through JSON for SELECT or ASK queries, rest for CONSTRUCT or DESCRIBE queries --matrix, -M (<file URL>) model to receive query results --minus, -m subtract query result from matrix --entry, -E XHTML entry type --depth, -r DESCRIBE query recursion depth --schema, -S XHTML schema ontology --axioms, -X Stacked engine axioms ontology --active, -A XHTML active ontology --original, -P XHTML original ontology --locals, -L XHTML localization ontology --lang, -l XHTML localization language --links, -p XHTML links prefix --output-file, -f Output file. --output-encoding, -c Output encoding (UTF-8 or UTF-16). (Default: UTF-8) --query-format (SPARQL | ARQ | RDQL) The query format (Default: SPARQL) --input-format (RDF/XML | N3 | N-TRIPLE) Format of the input file (valid only for the Jena loader). Default behaviour is to guess the input format based on the file extension. --display-query, -d Display the input query --sites-flag, -s Broadcast query to other sites --query-engine, -e (Pellet | ARQ | Mixed | SPARQL | Stacked) The query engine that will be used. Default behavior is to auto select the engine that can handle the given query with best performance. Pellet query engine is the typically fastest but cannot handle FILTER, OPTIONAL, UNION, DESCRIBE or named graphs. Mixed engine uses ARQ to handle SPARQL algebra and uses Pellet to answer Basic Graph Patterns (BGP) which can be expressed in SPARQL-DL. ARQ engine uses Pellet to answer single triple patterns and can handle queries that do not fit into SPARQL-DL. As a consequence SPARQL-DL extensions and complex class expressions encoded inside the SPARQL query are not supported. SPARQL is just plain Jena ARQ without pellet. Stacked engine first runs sparql to get a smaller model and then runs Pellet on that model plus an axioms file. --bnode Treat bnodes in the query as undistinguished variables. Undistinguished variables can match individuals whose existence is inferred by the reasoner, e.g. due to a someValuesFrom restriction. This option has no effect if ARQ engine is selected. --timing, -t Print detailed timing information --verbose, -v Implies -t -d --queryDB, -Q query from DB cache --cacheDB, -C save to DB cache --notry, -n no location mapping --help, -h Print this message

SPARQL Engines:

  • ARQ
    • ARQ handles the query execution
    • Calls Pellet with single triple queries
    • Supports all SPARQL constructs
    • Does not support OWL expressions
  • Pellet
    • Pellet handles the query execution
    • Supports only Basic Graph Patterns
    • Supports OWL expressions
  • Mixed
    • ARQ handles SPARQL algebra, Pellet handles
    • Basic Graph Patterns
    • Supports all OWL and SPARQL constructs

pellet4TF SPARQL engine uses TF ontology loader, unlike original Jena sparql. As a RDF tool, sparql does not understand ontology imports. For instance, the multipart ontology epi.owl cannot be directly queried with Jena sparql, but it can be so queried with pellet4TF SPARQL engine.

sparql --data=file:../owl/epi/epi.owl --query=s-inst.sparql ------------------------------------------- | inst | =========================================== | <http://tfs.cc/owl/epi/epi.owl> | ------------------------------------------- vs. time pellet4tf query -e SPARQL -q ../../io/s-all.sparql epi.owl ... Query Results (220977 answers): ...

The default value of the site-flag parameter from the command line is false. A query gets broadcast only if -s flag is given on the command line.

The -D parameter allows shorthands like the following:

pellet4tf query -D exp:Language pellet4tf query -D "exp:Language exp0:Language" pellet4tf query -D http://tfs.cc/ont/Country pellet4tf query -D "<http://tfs.cc/ont/Country>"

Prefixed names like exp:Language work for prefixes defined in etc/prefix.sparql , otherwise use full URI. Angle brackets around a URI are optional. If they are used, better quote the string to avoid ambiguity.

When a matrix model URL is given, CONSTRUCT or DESCRIBE query results are merged to the matrix model and the merger returned. This option is useful when a model needs to be enriched with reasoner results; for instance, when doing format conversion with queries. Example: pellet4tf query -q explit.sparql -M test.tf3 test.tf3 adds the result of query explit.sparql to test.tf3 .

When a matrix URL and flag -m are given, CONSTRUCT or DESCRIBE query results are removed from the matrix model and the matrix minus the results is returned. This option is useful when a model needs to be modified with reasoner results; for instance, when doing format conversion with queries. Example: pellet4tf query -q explit.sparql -M -m test.tf3 test.tf3 removes the result of query explit.sparql from test.tf3 .

A Pellet Mixed engine DESCRIBE query on anything but small datasets is often too slow, typically due to the inefficiency of realization (rdf:type inference). The TermFactory specific STACKED engine is an attempt to remedy this. The Stacked engine runs a first round DESCRIBE query with Jena SPARQL engine for the arguments of the input query, then runs the main querey with Pellet MIXED engine using as dataset the results of the first round. A schema ontology to reason with on the second round can be specified with property TF_AXIOMS or an axioms=... query string or --axioms=... command line parameter.

The first stage uses the more efficient SPARQL rdf query engine to select a set of asserted triples from a large ontology. The second stage uses as input the selected triples plus the supplied schema ontology to run the slower Pellet MIXED engine query on this (hopefully smaller but still sufficient) dataset. For instance, if the input query is a SELECT query, the pre-query replaces SELECT with DESCRIBE for the first round and applies the SELECT querey to the result of the DESCRIBE query.

The Stacked engine is sound but not safe: it may miss long distance entailments entailed by the original dataset but not by the extract made by the pre-query. For example, an instance base with statements :a :lt :b . :b :lt :c . and schema :lt a owl:TransitiveProperty . entails triple :a :lt :c . This statement will be included in DESCRIBE :a under the Mixed engine, but not under the Stacked engine, if :b :lt :c does not happen to get included in the pre-query DESCRIBE result. With unlimited DESCRIBE depth, Stacked engine is safe, but then the extracted model easily gets to be too big to allow any useful savings. (Compare pellet modularity.)

edit4tf

The TF editing back-end is implemented with class grapson.tf.rev.jena.Edit4TF. It can be run from command line using the script edit4tf in io/script :

Usage: edit4tf querystring operand URI, string, or - for stdin del | add | ed | pop | rw=relabel|deblank|reblank|... postfix operator verbose notry no location mapping queryFS=true|false query local filesystem database. Default true. queryDB=true|false query remote database. Default false. cacheDB=true|false save to database. Default false. readAll reads rdf file including imports writeAll outputs all statements in the model and its imports. editdAll includes xhtml content that is marked readonly into xhtml read. rewrite=relabel|deblank|reblank|... <format> give output format as in rewrite utility. template=<URI> output template for xhtml entry write root=<URI> root filter (one or more instances/classes) for xhtml entry write schema=<URI> schema ontology for xhtml entry write ` active=<URI> active ontology for xhtml entry write original=<URI> original ontology for xhtml entry write edits=<URI> edited ontology for xhtml entry write locals=<URI> localisation ontology for xhtml entry write lang=<ISO langcode> localisation language code for xhtml entry write links=<URI> hyperlink prefix for xhtml entry write

The querystring argument is identical to the edit service edit operation querystring. This allows testing query service operations offline. A rewrite operation given as argument of form rewrite= is carried out last. Arguments in the form rw= are carried out where they occur in the edit operation stack.

Edit operations are read in postfix (reverse Polish) notation: operations add and del pop the last two operands from the stack and carry out the indicated operation (subtract or add the second model from the first). Operation ed pops three operands and carry out the more complex edit operation described below. The remaining parameters can be freely interspersed. Here is an example of command line scripting the edit API. entity.ttl is a bilingual WordNet entry and entity-fi.ttl is its editable Finnish language content (the active model). entity-edits.ttl is the result of changing occurrences of Finnish word kokonaisuus 'whole' in entity.ttl to the more appropriate word olio 'being, entity'.

edit4tf entity-fi.ttl entity.ttl entity-edits.ttl del del entity-edits.ttl entity.ttl del add > entity-fi-edited.ttl deletes from entity.fi.ttl what entity-edits.ttl deleted from entity.ttl and then adds what it added. edit4tf entity.ttl entity.ttl entity-edits.ttl del del entity-edits.ttl entity.ttl del add > entity-edited.ttl does the same to entity.ttl rewrite XHTML entity-edited.ttl active=entity-fi-edited.ttl schema=../owl/wn/TFwn.owl > entity-edited.xhtml lays out entity-edited.ttl in xhtml for further editing against the updated Finnish file entity-fi.edited as active model.

Operation ed telescopes into one operation the edits on the three lines above in one go.

The operands to edit4tf can be model URIs or model strings from standard input. A bare filetype like .ttl is read from standard input in the format indicated by the filetype. An operand of that starts with the string ?p= is interpreted as an EditForm edit URL . Empty operands can be indicated by supplying names of nonexistent files:

edit4tf modelID=olio.ttl - - .ttl ed < olio.ttl

All this edit operation does is save the edits in olio.ttl in the database cache under the given model ID. The dash placeholders for missing active model and original on the line are not reserved, there just happens to be no file named dash.

index4tf

The index service can also be run from command line with the index4tf script:

index4tf Usage: index4tf <fromURL> <toURL> notry verbose site=<DAV URL> user=<DAV username> pass=<DAV password>

For instance, command line

index4tf file:entity.xhtml http://purl.org/vocabularies/princeton/wn30/synset-entity-noun-1

uploads Wordnet entry entity.xhtml to the default webdav directory under its official purl.org URI. The entry can be downloaded from the webdav cache as http://localhost/webdav/purl.org/vocabularies/princeton/wn30/synset-entity-noun-1/synset-entity-noun-1 . The default site is set with TF setting TF_INDEX_URL.

If toURL is not specified, the document located at fromURL is uploaded to webdav as fromURL. The user definable pseudo scheme dav+ is defined by a prefix mapping in etc/location-mapping.n3 as pointing to current value of TF_INDEX_URL. Note that dav+ is not a built in pseudo scheme, so it only works when location mappings are turned on.

To upload to the TDB triple store or to relational database, add pseudo scheme tdb+ or rdb+ respectively in the target upload location url. A pseudo scheme alone as toURL indicates to use the source URL as model ID, so that the following two commands mean the same.

index4tf file:ctryCode.xhtml tdb+file:ctryCode.xhtml index4tf file:ctryCode.xhtml tdb+

Indexing of documents to a webdav directory has the following safety feature against accidental overwriting.

index4tf http://localhost/active.ttl copy ok http://localhost/dav/localhost/active.ttl length 1024 index4tf http://localhost/active.ttl copy at http://localhost/dav/localhost/active.ttl exists, not overwritten index4tf http://localhost/dav/localhost/active.ttl copy ok http://localhost/dav/localhost/active.ttl length 211

If the location of the original does not point to the webdav directory, a copy is created for it in webdav. The copy can only be overwritten when it is accessed by its webdav directory url. This prevents accidentally overwriting the webdav copy another time with the (possibly already outdated) original.

Say we try to commit an edit of an original for which there already is a version on dav. The new edited version of the original fails to overwrite the existing version on dav. We can now choose between the following actions.

  1. Enter the address of the dav version as active ontology instead of the original and commit the edit. This edits the version on dav, creating a new version based on the existing version on dav.
  2. Enter a different upload address as toURL and commit the edit. This creates a new a branch from the official version at the alternative address.
  3. Enter the location of the dav version as toURL and redo the edit. This overwrites the version on dav, amounting to a revert to original followed by edit.

It is of course wiser to check at the outset that the ontology to edit is the latest version. Location mappings can be set up to map the original to the editable version, making the check automatic.

The remainder of this chapter documents third party tools that may be used in working with TF.

Editors

This section documents editors for manual refinement of ontology and terminology data in different formats.

XML editors

Ontologies in RDF/XML and TF entries in XHTML, TBX or any other XML format can be syntactically edited using any XML editor. The advantage may be that XML can be told to be pedantic about syntax, and there are a lot more tools for XML than for RDF or OWL. The disadvantage is that XML cannot check the semantics.

XMLmind XML editor

The XMLmind XML Editor from Pixware is a customisable XML editor written in Java which allows to edit large, complex, modular, XML documents in a structured WYSIWYG mode, i.e. the page looks uncluttered but you control what you do as precisely as if you were using an XML programmer's editor. XML can be rendered to a large variety of formats using the open source Formatting Objects toolchain. It natively supports MathML 2 Presentation Markup. XXE is highly customizable, without programming, by local gurus and consultants.

The TermFactory toolkit includes a plugin (configuration addon) for the free XML document editor XMLmind which allows structured editing of the LISA Oscar TBX terminology format in WYSIWYG mode. The TermFactory toolkit includes a plugin (configuration addon) for the free XML document editor XMLmind which allows structured editing of MultiTerm xml export format in WYSIWYG mode. The XMLmind XML editor user interface has been localized into Finnish.

It would not be difficult to construct a XMLmind skin for editing TF XHTML entries offline with XMLmind.

RDF/OWL editors

There is quite a selection of ontology editors to choose from. Listings are maintained at W3C , Wikipedia , and other places. A small sample is surveyed below.

Swoop

MINDSWAP Swoop OWL editor (version 2.3beta4 in 2007, no longer under development) was something like a testbed for the pellet reasoner. It still finds some use as a more or less direct graphical interface to pellet services. It has a funny venn diagram visualiser, an ontology partitioner which seems to only do proper partitions, so it may not be able to split a well-connected ontology. It has a query evaluator, but only for RQL queries. Keep in mind as a model or source for editing functionality in TF.

Protege

Protégé is a free, open source ontology editor and a knowledge acquisition system. Like Eclipse, Protégé is a framework for which various other projects suggest plugins. This application is written in Java and heavily uses Swing to create the rather complex user interface. Protege recently has over 100,000 registered users. Protégé is being developed at Stanford University in collaboration with the University of Manchester. Version 3 was developed at Stanford and has the most contributed extensions. Version 4 is being develped by Manchester University and is build on the Java Eclipse IDE. It supports OWL 2.0. A side-by-side comparison of Protege 3 and 4 is found in the Protege Wiki .

(version 0.4) The current version of Protege used in TF development is version 4. It is downloadable from http://protege.stanford.edu/ .

Protege 4 (as of version 4.1, Aug 2011) has limited support for working with anonymous individuals. There are no editing facilities for anonymous individuals. There are several places where anonymous individuals don't show up in the Protege 4.1 graphical interface. This makes working with anonymous individuals difficult.

Protege plugins

There are many contributed plugins to Protege 3 and 4 , of which the following deserve mention here.

  • Collaborative Protege (3)
  • ProSE (4)
Collaborative Protege

Collaborative Protege is an extension of Protege 3 that supports collaborative ontology editing. (It is not known if or when Collaborative Protege comes to Protege 4.) In addition to the common ontology editing operations, it enables annotation of both ontology components and ontology changes. It supports the searching and filtering of user annotations, also known as notes, based on different criteria. Collaborative Protege implements two types of voting mechanisms that can be used for voting of change proposals. Multiple users may edit the same ontology at the same time. In multi-user mode, all changes made by one user are seen immediately by other users. There are two working modes available for Collaborative Protege. Both modes support multiple users working on an ontology:

  1. The multi-user mode - allows multiple clients to edit simultaneously the same ontology hosted on a Protege server. All changes made by one client are immediately visible by other clients. This mode is also referred to as client-server mode, or concurrent mode and requires a client-server setup. This mode is based on the implementation of the multi-user Protege and is the preferred mode in which Collaborative Protege should be run.
  2. The standalone mode - allows multiple users to access the same ontology in succession. The ontology can be stored on a shared network drive and all clients will access the same project files. However, simultaneous access is not possible. This mode is also referred to as the consecutive mode.

The next figure shows the collaborative protege graphical user interface.

Show/hide Collaborative Protege

Collaborative Protege
ProSE

The ProSE plugin guides a human user in choosing what to import from an ontology into another. In particular, it helps the user choose big enough a subset of the external ontology so that the risk of unintentionally indirectly asserting new relations between the imported concepts is minimised. It also helps make sure that the imported subset is no bigger than required for that purpose. In the TF scenaario, one may or may not want to enrich relations between the imported concepts. depending on the case at hand. When needed, the ProSE plugin can be used to determine a suitable set to import.

TopBraid Suite

As part of TopBraid Suite, Composer incorporates a flexible and extensible framework with a published API for developing semantic client/server or browser-based solutions, that can integrate disparate applications and data sources.

Implemented as an Eclipse plug-in, Composer serves as a development environment for TopBraid Ensemble™ and for all the applications delivered using TopBraid Live™. Composer is used to develop ontology models, configure data source integration as well as to customize dynamic forms and reports.

Two versions are available - Standard Edition and Maestro Edition.

NeON Toolkit

NeOn is a 14.7 million Euros project involving 14 European partners and co-funded by the European Commission’s Sixth Framework Programme under grant number IST-2005-027595. NeOn started in March 2006 and has a duration of 4 years. Our aim is to advance the state of the art in using ontologies for large-scale semantic applications in the distributed organizations. Particularly, we aim at improving the capability to handle multiple networked ontologies that exist in a particular context, are created collaboratively, and might be highly dynamic and constantly evolving.

The first release of the NeOn Toolkit, one of the core outcomes of the NeOn project, is available for download and testing from the NeOn Toolkit and Community site.

(From NeOn Wiki:) The NeOn toolkit is a state-of-the-art, open source multi-platform ontology engineering environment, which aims to provide comprehensive support for all activities in the ontology engineering life-cycle. The toolkit is based on the Eclipse platform, a leading development environment. The toolkit provides an extensive set of plug-ins (currently 45 plug-ins are available) covering all aspects of ontology engineering, including:

  • relational database integration
  • modularisation
  • visualisation
  • alignment
  • project management

The NeOn Toolkit is part of the reference implementation of the NeOn architecture. The major goal of the NeOn project is to provide methodology, infrastructure and tools for designing and managing a new generation of knowledge-intensive semantic applications. The toolkit is implemented as an Eclipse application.

NeOn DIG plugin

A DIG reasoner interface plugin is advertised for NeOn as follows:

The purpose of the DIG Plugin is the implementation of the DIG Interface version 1.1. The DIG Description Logics Interface Version 1.1 is a specification for defining a new interface for DL Systems. It is effectively an XML Schema for a DL concept language along with ask/tell functionality. In two words, the DIG interface provides a standardized way to access and query a reasoner. The user initially chooses the desired action she is interested in. Then the ontology is translated into the DIG interface and sent to the reasoner along with the queries that have been posed. After the query processing inside the reasoner has taken place, the reasoner sends back to the user the response encoded in the DIG Interface and the user can extract the answer to her query. The interested reader is referred to the protocol specification for further information.

The entire concept language and tell/ask functionality is enough to capture every functionality that is usually provided by a reasoner. In the context of the use cases in the NeOn-Project the reasoning tasks that are of utmost importance to us are ontology coherency and classification. More concretely:

  • Ontology Coherency: The reasoner takes as input the (translated into the DIG protocol) ontology and for each concept it returns true or false, depending on whether the corresponding concept is satisfiable or unsatisfiable, respectively.
  • Classification: The reasoner takes as input the (translated into the DIG protocol) ontology and returns the inferred classification of the various concepts, as opposed to the explicit one that the user initially sees. Moreover, for every concept in the inferred hierarchy we get whether it is satisfiable or not.

It must be highlighted at this point that the DIG Description Logics Interface that has been partially implemented in this plugin does not come bound to any reasoner. Contrary to that, it only provides an interface to query any reasoner that supports the DIG Protocol. The motivation behind this is that the current plugin is a general-purpose plugin that is intended to be used with different reasoners, so we found it meaningful not to restrict it to a particular reasoner, but to rather provide the user with the flexibility to do that themselves, in full accordance with their needs.

OwlSight

OwlSight is a web based ontology browser from ClarkParsia that uses the Pellet reasoner. The browser is written with the Google Web Toolkit and it uses the OWL API to access ontologies.

Ontology Browser

Ontology Browser is a tool to dynamically gneate documentation for ontologies, based on the OWLDoc software. It uses OWL API to access ontologies and has an inteface to Fact++ reasoner.

OntoTrack

OntoTrack is a browsing and editing tool for OWL ontologies developed at Ulm University using the OWL API.

OntoWiki
OntoWiki is a tool providing support for agile, distributed knowledge engineering scenarios.

OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents.

Comparing OWL editors for TF

We have tried three java Eclipse based OWL editors with TF ontologies. Here are some of our observations:

The combined PULS/BioCaster epidemic ontology contains about 3000 classes (owl:Class 2709) and 20K instances (owl:Thing 22286). There are about 300K asserted triples. The OWLIM reasoner adds another 600K so the inference closure contains about 1M triples.

TopBraid loads the ontology in 00 seconds. It takes a couple of minutes to form the closure with OWLIMSwift. TopBraid feels triple oriented. It seems reasonably robust and provides some useful services (e.g. instance statistics exported in spreadsheet form).

NeOn toolkit takes several minutes to load epi.owl (downloading from the web). Most of the time is reported as spent reading RDF triples and building indices. The class editor is comparable to Protege. The individual editor does not create hyperlinks to object property objects. Reasoners are not mentioned on the NeOn Toolkit GUI (Version: 1.2.3 Build id: B1023 (2009-07-30)). NeOn has little built-in support for browsing and editing TF ontologies.

Protege 4 (version 4.0.114) is has fewer features than Protege 3, but best suited of the three for browsing TF ontologies, thanks to bundled reasoner support and hyperlinks.

For Protege 4 help, see http://protegewiki.stanford.edu/wiki/Protege4UserDocs .

Validators

A validator is a computer program used to check the validity or syntactical correctness of a fragment of code or document. The term is commonly used in the context of validating HTML, CSS and XML documents or RSS feeds though it can be used for any defined format or language. A reasoner (semantic reasoner, reasoning engine, rules engine) is a piece of software able to infer logical consequences from a set of asserted facts or axioms.

TF validation can be done using XML validators, RDF validators OWL syntax validators, OWL semantic reasoners, and TermFactory special built tools (see TF3 ). It is advisable to check third party ontologies with a validator before conversion to spot coding errors in incoming data.

W3C RDF validation service

The W3C RDF validation service is based on Another RDF Parser (ARP). The service supports the Last Call Working Draft specifications issued by the RDF Core Working Group, including datatypes. The service does not do any RDF Schema Specification validation. Note that other online RDF validation services are available.

WonderWeb OWL validator

WonderWeb OWL validator can be used to check the conformance of TF ontologies to the OWL 1.0 standard. TF adheres to OWL DL in order to benefit from ontology reasoners.

Validation with reasoners

the Pellet reasoner offers various tools for ontology validation. In particular, pellet consistency and pellet explain are handy:

pellet help consistency PelletConsistency: Check the consistency of an ontology Usage: pellet consistency [options] <file URI>... Argument description: --help, -h Print this message --verbose, -v Print full stack trace for errors. --config, -C (configuration file) Use the selected configuration file --loader, -l (Jena | OWLAPI | OWLAPIv3 | KRSS) Use Jena, OWLAPI, OWLAPIv3 or KRSS to load the ontology (Default: OWLAPIv3) --ignore-imports Ignore imported ontologies --input-format (RDF/XML | Turtle | N-Triples) Format of the input file (valid only for the Jena loader). Default behaviour is to guess the input format based on the file extension. pellet help explain PelletExplain: Explains one or more inferences in a given ontology including ontology inconsistency Usage: pellet explain [options] <file URI>... The options --unsat, --all-unsat, --inconsistent, --subclass, --hierarchy, and --instance are mutually exclusive. By default --inconsistent option is assumed. In the following descriptions C, D, and i can be URIs or local names. Argument description: --help, -h Print this message --verbose, -v Print detailed exceptions and messages about the progress --config, -C (configuration file) Use the selected configuration file --ignore-imports Ignore imported ontologies --unsat (C) Explain why the given class is unsatisfiable --all-unsat Explain all unsatisfiable classes --inconsistent Explain why the ontology is inconsistent --hierarchy Print all explanations for the class hierarchy --subclass (C,D) Explain why C is a subclass of D --instance (i,C) Explain why i is an instance of C --property-value (s,p,o) Explain why s has value o for property p --method, -m (glass | black) Method that will be used to generate explanations (Default: glass) --max, -x (positive integer) Maximum number of generated explanations for each inference (Default: 1)

Reasoners

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining and backward chaining.

This section documents ontology query and rule languages and reasoners that may be used in TF. For a survey see Sattler .

Jena ARQ and SPARQL query language

ARQ is an open source SPARQL language query engine over RDF graphs implemented in Java.

Jena ARQ engine implements DESCRIBE queries in a fixed way (it uses a one-level, all-properties query). The result set is different and usually smaller than from TF DESCRIBE. Jena ARQ reads RDF models, so does not handle ontology imports (imports are an OWL construct). On the other hand, arq command-line accepts multiple --dataset arguments.

Fact++

FaCT++ is the new generation of the well-known FaCT OWL-DL reasoner. FaCT++ uses the established FaCT algorithms, but with a different internal architecture. FaCT++ is implementated using C++ in order to create a more efficient software tool, and to maximise portability. Fact++ is available in the Protege editor.

Pellet

Pellet is an open source reasoner for OWL 2 DL in Java. It provides standard and cutting-edge reasoning services for OWL ontologies. Pellet supports queries in SPARQL-DL , an OWL-DL query language that syntactically extends the SPARQL query language for RDF.

We have considered the feasibility and need of adapting the Pellet SPARQL-DL reasoner / query engine to the TF repository network so that it can carry out federated queries in a network of TF repositories. So far, it appears we can get by doing reasoning locally the way Pellet works from the box.

There are two basic needs for reasoning, one during ontology development to check that no conflicts are created and to inform peers about changes, and another during querying. In each case, it seems feasible from architectural point of view, and preferable from efficiency point of view, to reason in preloaded locally cached ontologies rather than download imports during tableau construction. See also separate paper .

Pellet is an open source reasoner for OWL 2 DL in Java. It provides standard and cutting-edge reasoning services for OWL ontologies. Pellet is available in the Protege editor.

The current Pellet4TF reasoner is based on Pellet 2.0. release candidate 7. The Pellet 2.0 Release (November 16, 2009) includes

  • full OWL 2 support (modulo a few bugs that will be fixed in the 2.1 release)
  • supports domain and range axioms, class expressions, qualified cardinality restrictions, literal constants, annotations, and nested class expressions in SPARQL queries
  • support for all SWRL builtins, including previously missing builtins (substring, tokenize, and optional precision parameters for roundHalfToEven)
  • optimized support for OWL 2 EL reasoning; OWL 2 EL reasoner is autoselected based on data input
  • supports automated ontology module extraction
  • supports incremental classification
  • supports fine-grained inference extraction
  • enhanced SWRL rules performance
  • OWLAPI v3 support
  • lots of improvements, cleanups to Pellet’s command line tools
  • updated to work with Jena 2.6.2 — Pellet is the only DL reasoner available from Jena
  • supports explanations via Jena
  • support autoselecting best SPARQL query engine based on input query
  • user-defined timeouts for reasoning
  • switch to dual license model to support commercial and open source projects

This release marks a change in Pellet development process: starting with 2.1, Pellet will be released according to a time-based development cycle. We will do four quarterly releases per year. We will make point releases between the quarterly releases, as necessary, to fix critical bugs only. Thus, the release schedule for the 2.x series will be 29 March 2010, 28 June 2010, 27 September 2010, 20 December 2010.

The following notes are from pellet.owldl.com/downloads/pellet-tutorial.pdf .

Pellet can be used via three different APIs

  • Internal Pellet API
  • Manchester OWLAPI
  • Jena API

Each API has pros and cons. Choice will depend on your applications’ needs and requirements.

Pellet Internal API
  • API used by the reasoner
  • Designed for efficiency, not usability
  • Uses ATerm library for representing terms
  • Fine-grained control over reasoning
  • Misses features (e.g. parsing & serialization)
  • Pros: Efficiency, fine-grained control
  • Cons: Low usability, missing features
Manchester OWLAPI
  • API designed for OWL
  • Closely tied to OWL structural specification
  • Support for many syntaxes (RDF/XML, OWL/XML, OWL functional, Turtle, ...)
  • Native SWRL support
  • Integration with reasoners
  • Support for modularity and explanations
  • Pros: OWL-centric API
  • Cons: Not as stable, no SPARQL support
  • More info: http://owlapi.sf.net
Jena API
  • RDF framework developed by HP labs
  • An RDF API with OWL extensions
  • In-memory and persistent storage
  • Built-in rule reasoners and integrated with Pellet
  • SPARQL query engine
  • Pros: Mature and stable and ubiquitous
  • Cons: Not great for handling OWL, no specific OWL 2 support
  • More info: http://jena.sf.net
Advanced Pellet programming

Main processing/reasoning steps:

  1. Loading data from Jena to Pellet
  2. Consistency checking
  3. Classification [Optional]
    • Compute subClassOf and equivalentClass
    • inferences between all named classes
  4. Realization [Optional]
    • Compute instances for all named classes

Steps should be performed in the given order. No need to repeat any of the steps unless the underlying data changes. Loading and consistency checking mandatory. Classification and realization optional, performed only if required by a query.

  • Queries triggering classification:
    • Querying for equivalent classes
    • Querying for (direct or all) sub/super classes
    • Querying for disjoint/complement classes
  • Queries triggering realization:
    • Querying for direct instances of a class
    • Querying for (direct or all) types of an individual

An axiom can be interpreted with open world assumption (OWA) - regular OWL axiom, or closed world assumption (CWA) - integrity constraint (IC). How to use ICs in OWL? Two easy steps:

  1. Specify which axioms should be ICs
  2. Validate ICs with Pellet
  • Ontology developer
    • Develop ontology as usual
    • Separate ICs from regular axioms (annotation, separation of files, named graphs, ...)
  • Pellet IC validator
    • Translates ICs into SPARQL queries automatically
    • Execute SPARQL queries with Pellet
    • Query results show constraint violations

The Pellet reasoner can be used to query TF ontologies as follows.

pellet.sh query -q query.sparql file:///home/lcarlson/Data/CF/TF/cnv/bio/bc2e.owl

Here is an example query:

# Give me all items that are members of DISEASE and # tell me all classes they belong to. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX bc: <http://biocaster.nii.ac.jp/biocaster#> CONSTRUCT { ?X rdf:type ?C } WHERE { ?X rdf:type bc:DISEASE . ?X rdf:type ?C . }

The result of the query is a sub-ontology which reveals that the BioCaster ontology makes some category errors, in particular, these triples:

<rdf:Description rdf:about="http://biocaster.nii.ac.jp/biocaster1#NON_HUMAN_138"> <rdf:type rdf:resource="http://biocaster.nii.ac.jp/biocaster#DISEASE_24"/> </rdf:Description> <rdf:Description rdf:about="http://biocaster.nii.ac.jp/biocaster1#DISEASE_49"> <rdf:type rdf:resource="http://biocaster.nii.ac.jp/biocaster#SubCountry"/> </rdf:Description>

Pellet does not support DESCRIBE queries. The PELLET engine only queries graph patterns. The Mixed engine uses the Pellet engine to do the graph pattern part of a query and Jena ARQ for other SPARQL constructs. The Pellet ARQ engine option loads an ontology into a pellet kb and runs the ARQ engine on triples retrieved by the Pellet reasoner.

HermiT

HermiT (by Oxford University) is the first publicly-available OWL reasoner based on a novel “hypertableau” calculus which provides much more efficient reasoning than any previously-known algorithm. Ontologies which previously required minutes or hours to classify can often by classified in seconds by HermiT, and HermiT is the first reasoner able to classify a number of ontologies which had previously proven too complex for any available system to handle.

Hermit is built on OWL 2.0. Hermit uses OWLAPI 3 so it is not inegrated to Protege until Protege 4.1. It is particular about datatypes: an ontology whose property instances do not explicitly specify the datatype required by the property type is inconsistent for Hermit.

Hermit 1.0 does not have a query language, but it has some built-in queries. For instance, one can query both direct and indirect superclass and subclass relationships. (This is something neither pellet and sparql does, they only do one or the other.) Hermit reads a variety of formats, but the only output syntax is OWL functional style. Here is a list of the built-in query types:

Actions: -l, --load parse and preprocess ontologies (default action) -c, --classify[=FILE] classify ontology, optionally writing taxonomy to FILE (use - for standard out) -k, --consistency[=CLASS] check satisfiability of CLASS (default owl: Thing) -d, --direct restrict next subs/supers call to only direct sub/superclasses -s, --subs=CLASS output classes subsumed by CLASS (or only direct subs if following --direct) -S, --supers=CLASS output classes subsuming CLASS (or only direct supers if following --direct) -e, --equivalents=CLASS output classes equivalent to CLASS -U, --unsatisfiable output unsatisfiable classes (equivalent to --equivalents=owl:Nothing) --print-prefixes output prefix names available for use in identifiers -E, --checkEntailment check whether the premise (option premise) ontology entails the conclusion ontology (option conclusion) Internals and debugging (unstable): --dump-clauses[=FILE] output DL-clauses to FILE (default stdout) --taxonomy output the taxonomy

The Hermit reasoner can be used to query TermFactory ontologies using the following command line script:

#!/bin/sh export HERMIT_HOME=/opt/hermit #java -jar $HERMIT_HOME/HermiT.jar "$@" java -jar $HERMIT_HOME/HermiT.jar --ignoreUnsupportedDatatypes "$@"

With this script, command

hermit --ignoreUnsupportedDatatypes -k TFS.owl <http://www.w3.org/2002/07/owl#Thing> is satisfiable.

tests TFS.owl for consistency, and

hermit -ds b:Object TFS2.owl Direct sub-classes of 'b:Object': c:Term b:Concept k:Expression

prints out the direct subclasses of ont:Object . (The prefixes b, c etc. are chosen by Hermit. They can be printed out with --print-prefixes switch.)

OWLIM

OWLIM is a high-performance semantic repository implemented in Java and packaged as a Storage and Inference Layer (SAIL) for the Sesame RDF database. OWLIM is based on TRREE - a native RDF rule-entailment engine. The supported semantics can be configured through rule-set definition and selecgtion. The most expressive pre-defined rule-set combines unconstrained RDFS and OWL Lite. Custom rule-sets allow tuning for optimal performance and expressivity. In SwiftOWLIM reasoning and query evaluation are performed in memory. A principal limitation of OWLIM is the relatively slow delete operation. Upload, reasoning, and query evaluation proceed extremely fast even against huge ontologies and knowledge bases. OWLIM can manage millions of explicit statements on desktop hardware. According to public evaluation data, SwiftOWLIM was the fastest OWL repository available (2007). OWLIMSwift is the default reasoner in TopBraid Composer (Version: 3.2.1.v20091123-1212R).

Visualizers

This section documents tools for graphical presentation of ontologies

The TermFactory Visualizer TFVisu

A first version of the TermFactory graph visualizer exists. The core graphics and web service code was written by Seppo Nyrkkö. TFVisu has now been integrated in the TF web service backend architecture. The visualizer service allows reading ontology URL's in TURTLE format and choosing seed resources whose neighborhood(s) the tool visualizes as a RDF style labeled line-and-circle diagram (cirles represent nodes, lines arcs). Images requested from the visualizer are currently cached into a sql database held at the web service client side.

Given two URL's representing two versions of the same ontology, TFVisu shows their join and differences in three diagrams. Example:

Show/hide Visualizer

Bramble

Bramble is a framework which visualises RDF graphs in a native browser environment, using the SVG standard and JavaScript technology.. Graphs can be directly expanded, modified and explored. Users select nodes and edges from a central data repository containing millions of statements. The resulting graph can be shared with other users retaining full interactivity for collaborative work or presentation purposes.

Software libraries

There are a number of open-source program libraries for handling RDF graphs and OWL ontologies, notably, OpenRDF, Jena, and the OWL API. The TF project currently uses the HP Jena Java RDF/OWL code library as well (in a small way) the OpenRDF Sesame RDF stack. We also use OWL API in connection with the Pellet TF reasoner. .

OpenRDF Sesame

Sesame is an open-source framework for querying and analyzing RDF data. It was created, and is still being maintained, by the Dutch software company Aduna. It was originally developed as part of the "On-To-Knowledge", a semantic web project that ran from 1999 to 2002. It contains a triplestore. OWLIM is based on Sesame.

One of the primary components of Sesame is the RDF query language SeRQL. Another component is Elmo, an API that allows for querying of specific ontologies like RSS, FOAF and Dublin Core.

Jena Library

Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these. A Model can also be queried through SPARQL and updated through SPARUL.

Jena was originally developed by the HP Labs. Later it has moved to Apache and is now licensed under the Apache Software License. Website: http://incubator.apache.org/jena . Download: http://incubator.apache.org/jena/download/ . Source: README, updated 2011-12-28.

Jena is similar to Sesame; though, unlike Sesame, Jena provides support for OWL (Web Ontology Language). The framework has various internal reasoners and also provides support for external reasoners through the DIG interface. In addition the Pellet reasoner (an open source Java OWL-DL reasoner) can be set up to work in Jena without using the DIG interface. This allows for improved speed and overcomes some of the limitations in the DIG protocol.

Jena supports serialisation of RDF graphs to:

  • a relational database
  • RDF/XML
  • Turtle
  • Notation 3

Jena offers some useful command line tools for working with models at http://jena.sourceforge.net/tools.html .

OWL API Library

The OWL API is a Java interface and implementation for the W3C Web Ontology Language OWL. The latest version of the API is focused towards OWL 2 which encompasses, OWL-Lite, OWL-DL and some elements of OWL-Full. The OWL API is open source and is available under the LGPL License. The OWL API includes the following components:

  • An API for OWL 2 and an efficient in-memory reference implementation
  • RDF/XML parser and writer
  • OWL/XML parser and writer
  • OWL Functional Syntax parser and writer
  • Turtle parser and writer
  • KRSS parser
  • OBO Flat file format parser
  • Support for integration with reasoners such as Pellet and FaCT++
  • Support for black-box debugging

The original version of the API for OWL 1.0 was developed as part of the WonderWeb Project. The latest version of the OWL API for OWL 2.0 was developed as part of the CO-ODE project and the TONES project. Contributors. The OWL API is primarily maintained at the University of Manchester, but there have been significant contributions from various groups and companies:

  • Clark & Parsia LLC (Explanation code, Modularity code)
  • University of Ulm (KRSS2 syntax parser and renderer)

The main purpose of the Manchester OWL API is to provide support for OWL 2. The latest version (3.0 in 2010) is the engine used in Protege 4. It supports parsing and rendering ontologies in RDF/XML, Turtle, OWL/XML, OWL Functional Syntax, and Manchester OWL Syntax. It provides a reasoner interface to CEL, Fact++, HermiT, Pellet, and RacerPro. OWL API may be downloaded via SourceForge.

Services

TF web services concept

This section documents TF web services. With a web services API, TF aims at a loose coupling between the TermFactory back end open for alternative front-end designs and implementations. The simple reason is that front ends have very short shelf times. We noted it does not pay to fix upon any particular platform. Instead, TF is to provide a relatively wear-resistant content design and workflow, a generic back end design and implementation, and a flexible front end design plus a light demonstrator of the same.

WS overview

Show/hide TF services overview

TermFactory Services overview
Termitehtaan palvelutavat
TF services

Behind the scenes, TF nodes communicate using web services. As shown in the figure above, there are more than one way how TF involves web services. Starting from the bottom, the network of TF repositories appears to higher layers as an ontology service. Above that, the TF query backend includes the TF federated reasoner which allows submitting distributed queries to a repository network. The TF edit service allows saving and retrieving entries for editing from repositories. On top of that, there is the TermFactory API which supports embeddable services to TF front ends such as text area editors and commenting / voting systems. TF front ends can be implemented natively on different collaborative platforms, included into them as more standalone server side plugins, or merged on existing pages as client side href="http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29">mashups .

A similar approach was taken in the Finnish KITES Multilingual Workbench/Desktop project. MLWB tools are typically used through organization’s own productivity tools such as Microsoft Word, business portals, and content management systems, or through their own user interfaces.

Show/hide MLWB Architecture MLWB Architecture

We first considered using "big web services", including the XML based SOAP, WSDL, BPEL, and UDDI standards. The advantage seemed to be that there was a lot of existing code. On the downside, the resulting overall system could be expected to become like a tectonic plate: huge and slow. There were many inertial components involved: Java, XML, OWL, Web...

The following was an early cast for a TF web service system design: A QueryForm client forms a sparql-dl query from user input. A TF service contains the distributed reasoning engine, an extension of the Pellet reasoner along the lines of federated reasoners Drago and PDL. The reasoning engine uses UDDI to map graph and term URLs on TF instance URL's. As a reasoner proceeds, it sends subqueries for data through the BPEL service or to one or more ``local'' Query Services or to other TF repositories. The BPEL service schedules and optimises such subqueries and relays the results back. The TF service aggregates the results and returns them to the client. Labels like QueryForm, BPEL and UDDI are only suggestive for the type of functionality meant.

A central ontology registry and workflow manager was first planned to be implemented using XML based web service techniques ("Big Web Services", including UDDI and BPEL). But a centralised registry architecture was subsequently abandoned as too complex and fragile. Lucky for TF, for by now UDDI seems dead and BPEL is not much better off. The Axis2 WSDL service layer is not doing much work either, to tell the truth.

Another proposal for an ontology registry is Oyster provided in the Neon Toolkit . It uses peer-to-peer networking to contact repositories.

The TF needs that first seemed to call for centralized cataloguing and orchestrating services include these:

  1. How to tell which TF concepts belong to which TF ontology
  2. How to tell which TF ontologies are managed with which TF repository server
  3. How to delegate term queries down the line when ontologies import terms from ontologies managed by other repositories
  4. How to check that a given cached description of a term is up to date relative to the ontologies it imports from

On further thought, it started to appear that many if not all of these problems can be solved (or circumvented) by judicious application of existing web addressing technology. The key is to conform to standard web addressing orthodoxy. The orthodoxy is to make web content associated to a given URI simply available at that location (see e.g. http://www.w3.org/Addressing/ ).

1. An example of the first question is how to locate the TF description of a concept uri like http://tfs.cc/ont/English in the TF repository system. The answer is now relatively simple. It is enough for a web client to send a GET request to the URL http://tfs.cc/ont/English . The request goes to the webserver at http://tfs.cc . Since this server is a TF repository server, it uses the server redirect capability to translate the incoming uri concept/English to http://localhost:8080/TermFactory/query?uri=http://tfs.cc/ont/English . This is a TF query service url which is handled by the local Tomcat application server at the tfs host. Further redirects cen be defined in TF location mappings . Such mappings can also register which ontologies should be searched for a given term URI or prefix name (QName).

2. The second question is similarly a non-issue: by web addressing orthodoxy, the ontology URL tells where the ontology is.

3. As for TF imports , TermFactory imports are identified by TF URLS. That is, a TF query import URI points to a TF service which returns the result of a query as an OWL model (possibly one cached in the repository database or one saved as text files on the server).

Loading an ontology at a node can indirectly cause relayed queries at any number of other nodes, possibly rounding back to the first node. In the OWL standard, cyclic import (causing mutual or self-import) is allowed and entails identity (equivalence). Cycling is checked in jena by keeping a list of imported ontologies during model load and checking when the url of a model to loacd is already on the list. TF imports are covered, a special case of jena OWL imports.

4. The question about version updates is solved internally on per server basis, as explained in the section on cache updates . The main point is that each term query result brings along with it version information about imported ontologies. When a site updates an ontology, it updates the version information saved on the server's ont-policy file, where it gets looked up by the location mapping retry facility.

So it is turning out that the TF repository network can be built without big web services. The functions of boxes labeled UDDI and BPEL in the blueprint are getting solved with existing uri redirection services running locally in each TF instance. Some background theory and a general outline of the design is given in a separate paper .

In version 1.2, a TF repository network exists as it were implicitly, through TF uri naming conventions. Each TF server operates independently; there is no privileged communication between such servers beyond mutual query service requests. It is not the servers that depend on one another, but the ontologies stored in their repositories. An ontology housed in one server becomes dependent on another TF server through importing some (small) ontology whose uri is maintained by that other server (simply because the imported ontology uri, directly or through location mappings, queries the other server).

Version 1.2 TF repositories work completely independently of one another, communicating only through repository URLs. Adding a new repository does not require any updating in the rest of the network. Removal of a repository just means that some queries will no longer return answers. The only way how a TF repository hierarchy comes about (at the current state of the implementation, when no attention has yet been paid to business aspects), is through the internal logic of the participating ontologies. A top ontology is one which is imported (from) by many ontologies; a leaf ontology is not imported (from) by other ontologies.

(version 0.1) An initial implementation of the TF backend exists. It sends Jena SPARQL queries to a persistent FT Jena OWL database in MySQL over the web using Axis2 POJO (plain old Java object) web service calls. This implements a baby version of a standalone TF repository, i.e. the aqua, green, blue, and red boxes in the following figure.

(version 1.0) An improved implementation of the TF backend exists. It is implemented as a set of Axis2 web services serving a webapp / RESTful web api done with java servlets. The following is the current Axis2 listing of TF services. The orange boxes labeled BPEL and UDDI, i.e. the repository network choreography, remain to implement.

TF API

This section documents the TermFactory Web API (web application programming interface). It lists the functionalities offered by the TF repository server network which different user interfaces or platforms can exploit according to their needs and capabilities.

  • Query
  • Edit
  • Index

The TF web services

A simple-minded but reasonably full implementation of the TF ontology repository query backend exists. It is implemented as a Tomcat webapp backed up by a set of Tomcat hosted Axis2 web services. The servlets in the web application access the Axis2 services as usual in java through a Axis2 client. On top of this, the webapp servlets constitute a more lightweight REST style web API layer, accessed using HTTP GET (urls) or POST. The following is the current Axis2 listing of the backed TF services. The box labeled UDDI is in effect implemented with Jena location mappings, and the box labeled BPEL is implemented with the location mapping retry facility in QueryService.

A TermFactory site network consists of third party collaborative wiki/forum front end servers and TF repository back end servers. The front ends communicate with the back ends through TF mashups for querying, editing etc. The back-ends communicate with one another through webservice requests and responses. There may also be a common directory/registry that tells the nodes of the network where to look for what kind of content. The repository servers store and communicate TF OWL ontologies in some suitable form. The forum servers may communicate using RSS. Repository servers may pair up with collaborative (forum/wiki) servers, but that is not necessary. The repositories and the collaboration platforms may or may not share infrastructure at some level; most likely, they are separate. The two subsystems can also work independently, without one another. Each node can also operate standalone.

The forum/Wiki platforms combine formatted (form and editable graph) and free-form (text/read-only image) views to the same ontology entities. The formatted content is for human-machine communication, the text human to human.The forms are generated dynamically from ontology URLS to keep communication efficient. The back-end for the formatted views can be technically a TF repository, but not necessarily in the same db instance and most likely not the same model/graph as already established content.

TF Axis2 services
TermFactory Web Services
Termitehtaan webipalvelut

Axis2 screenshot


Axis2 screenshot

TF Axis2 services live

TFServices

The most common approach to deploying a service in Axis2 is just copying or dropping the service archive (.aar) file into the repository (the services directory), or deploy the service using the Axis2 Web console. TermFactory Axis2 services can be deployed individually or as an Axis2 service group (TFservices.aar). The services share libraries, so it is better to go for the service group to deploy more than one service in one installation. An advantage of the service group option is that shared TF libraries get pooled together in one common service archive (TFServices.aar). You can also deploy a service by extracting the contents of the .aar into the webapps/axis2/WEB-INF/services directory of your servlet engine. (Note the Axis2 WAR file must be installed first in the servlet engine.) This is known as the "exploded" format. To hot update the service, one can then just edit or replace files in the TFServices directory. Warning: new copies of axis2 service lib jars get installed to $CATALINA_HOME/temp at each deployment. This directory is not purged automatically. Search Web for "axis2 temp files" for details.

QueryService

QueryService serves queries against file or database TF repositories using the new Pellet-based TF query engine Pellet4TF. Compared to v. 0.1, the improvements include the following.

  • version 1.0 uses the Pellet4TF reasoner.
    • choice between ARQ, PELLET, Mixed, SPARQL and Stacked query engine.
    • supports querying both and file and database repositories.
    • TF DESCRIBE query implemented.
    • TF query imports implemented.
    • choice of output format
    • choice of output encoding

QueryService implements the setQuery and getQuery operations. These operations set query bean parameters sent to the service and get them after the query has been carried out, respectively. The fields of the query bean are listed here.

String query; string to be parsed as a SPARQL query string or as a space-separated list of names for documents or resources
String queryType; URL, URI, ANY (same as null)
String answer; returned document
String[] repos; array of repository names
String engine; name of query engine to use (null for default)
String axioms; name of ontology to use for stacked engine reasoner schema
String format; format of ontology document to return from CONSTRUCT or DESCRIBE query
String format2; format of result set to return from SELECT query
String template; XHTML output template
String root; XHTML output root filter
String schema; XHTML output schema
String active; XHTML output active ontology
String original; XHTML output original entry
String edits; XHTML output edited entry
String locals; XHTML output localization ontology
String lang; XHTML output localization language
String links; XHTML output hyperlink map
String[] rewrite; an array of rewrite commands
String matrix; rewrite command matrix ontology
String encoding; character encoding (UTF-16 for Windows or UTF-8 for Linux)
String queryID; query ID to stop federated query loops
String modelID; ID of model in DB
String location; url of document returned by retry
String message; message returned from service
int bits; TF flags coded as bit vector
Relayed queries

Instead of letting one query engine to do all the work, it should be possible to relay a query through other sites in the TF repository network and collect the results. A relayed query is a query run on the query service of a different repository. Instead of a list of ontologies as dataset, a relayed query is against a list of query services. Instead of fetching all the ontologies in the repositories and running the query locally, this type of query distributes the query for the several repositories to run and merges the results from them.

Since datasets are identified by URLs in TF whatever their size or location, nothing extra is needed in TF to implement relayed queries. A relayed query is a query whose repository is the result set of another query to a TF query service.

Setting TF_REPOS in a site's TF options etc/tf.properties defines the default repositories queried from that site (that list is consulted if there is no repo parameter in the query). Here is an example of a relayed query:

http://localhost:8080/TermFactory/query?pattern=Finn&repos=http%3a%2f%2ftfs.cc:8080%query?pattern=Country&repos=TFS.owl%0aPlace.owl%0a%0a%2fDomain%2fGeography%2fCountry%2f%0a

What this does is use the local query engine to query for items related to Finns in a dataset composed of (i) the result of a query to TF schema about countries and (ii) the local ontology index under countries. Note the double newline (percent encoded as %0a%0a ) marking the end of the embedded query's repo parameter.

Another observation: Since query engines are web services in TF also identified by URL, a TF site does not even need to have a local query service, it can use a remote one to query local ontologies as well as remote repositories (provided it has permission to do so). A TF site can also be a virtual construction implemented by a cluster of nodes.

Broadcast queries

A broadcast query in TF is a query that is recursively relayed through a network of sites. Broadcasting in TF is not completely general, but each site only relays a query to such further sites as are specified by the site. The set of sites to which a given site will relay a broadcast query to is looked up from the site's TF option TF_SITES. Assume the site list contains just the one site http://tfs.cc:8080/TermFactory/query . The local query

http://localhost:8080/TermFactory/query?pattern=China

gets in effect rewritten to

http://localhost:8080/TermFactory/query?pattern=China&repos=http%3a%2f%2ftfs.cc:8080%2fTermFactory%2fquery?pattern=China

This causes the local query engine to first relay the broadcast query to http://tfs.cc and then use the results as a dataset for its own query.

A provision is needed against cycles caused by a query getting sent back to where it has already been. This is prevented by giving a broadcast query an UUID. Each site keeps a list of served query IDs. Queries on the list are ignored. When a query is done, the originator broadcasts an OK to remove the query ID from the network. Broadcasting is the default. It can be explicitly prevented with query parameter sites=false .

TF resources have URIs that identify the resource's owner, so the default is to get a named resource from the owner. A network broadcast is involved when we don't exactly know what we want Then the best bet is to search our own and neighboring sites, which in turn may query their neighboring sites. Cycles in the network are stopped with the query ID. For a more general solution, compare federated SPARQL queries .

EditService

EditService setEdit/getEdit operations implements four main actions: edit, save, delete, and add. edit does delete and add in one transaction. An edit operation works on RDF triples. A matrix triple set is edited by subtracting a delendum set and/or adding an addendum set to it. Triple sets can be specified by an URL, a jena model, a W3C DOM tree, or an XHTML string. In Edit service, the edit parameters are conveyed in an Edit bean with the following fields.

String op; string parsed as edit operation. A query parameter string containing parameters of form p=name, op=opcode, rw=command, or one of opcodes add, del, ed, pop
String schema; XHTML output schema
String template; XHTML output template
String root; XHTML output root filter
String active; active ontology
String original; original entry (name or contents)
String edits; edited entry (name or contents)
String format; format of ontology document to return
String rewrite; a rewrite command
String locals; XHTML output localization ontology
String lang; XHTML output localization language
String links; XHTML output hyperlink map
String modelID; ID of model in DB
String answer; returned document
String message; message returned from service
int bits; TF flags coded as bit vector

The operation carried out is given in the operation field, matching the operations described for edit4tf . The default opcode (assumed when no explicit opcode is given) is ed . For accessing the edit service with a URL see EditForm .

IndexService

IndexService provides TermFactory webserver directory management using webDAV. It offers various ways of indexing a given ontology file (ontology or entry file) under different descriptions. For discussion, see section on URI conventions.

The webDAV protocol does not support filesystem links. As a substitute, TF IndexService creates a directory for the (slash vocabulary) resource URI and puts in it a php directory index file index.php which forwards user agents to the actual resource document using a http redirect header.

TF IndexService can be used to upload an xhtml entry produced by the query service, say ctryCode.xhtml for resource http://tfs.cc/ont0/ctryCode , to a mirroring server grapson.com 's TF webdav directory using the webdav url http://grapson.com/webdav/tfs.cc/ont/ctryCode.xhtml . The file is then accessible for reading and writing at grapson as http://grapson.com/tfs.cc/ont/ctryCode.xhtml . In addition, a php redirection file (link file) http://grapson.com/tfs.cc/ont/ctryCode/index.php is created which makes the directory url http://grapson.com/tfs.cc/ont/ctryCode fetch the same xhtml file. Further such php links to the file can be created elsewhere for multiple indexing. Alternatively the grapson.com webserver is told to rewrite http://tfs.cc/ont/.* to http://grapson.com/tfs.cc/ont/*.xhtml .

In the home server tfs.cc , the directory /ont can be redirected or linked to a webdav directory http://tfs.cc/webdav/ont . This makes the TF resource uri http://tfs.cc/ont/ctryCode directly web addressable and web editable as a TF entry, in conformance to web addressing orthodoxy. For hash vocabularies. the only blemish is the necessity to substitute a slash for the fragment hash, in order to have a resolvable URL visible to web servers in the first place.

IndexService implements the setIndex and getIndex operations. These operations set query parameters in the bean sent to the service and get them after the query has been carried out:

String modelID; String fromURL; String toURL; String contents; String format; String encoding; String answer; String message;

The operation depends on the parameters given. If modelID and toURL are given, the model cached at modelID is uploaded to toURL. Else if contents and toURL are given, the contents are uploaded to toURL. Else if contents and modelID are given, the contents are cached in DB.

Notes on the installation of IndexService in Ubuntu :

  • activate dav module in apache (to make server serve dav directories)
  • create a directory in webserver for webdav
  • set up authentication (create http user and password for the webdav directory or use ldap)
GateService

TF GateService user management is based on the user management of the GlobalSight open source TMS (translation management system). TermFactory users are maintained in and managed through GlobalSight server instance with TF specific extensions. GlobalSight user management is based on an "ontology" of users, companies, locale pairs, activity types, and roles. Users belong to a company, and can belong to one or more roles in the company. A GlobalSight role is identified by company, activity type (for instance, translation, revision, terminology), and locale pair (for instance, from English to Chinese). TF adds one more attribute to the list, called domain. A TF domain is a regular expression on ontology URIs, for instance, "http://tfs/cc/.*" matches all TermFactory ontology uris.

The TF extension of GlobalSight is packaged as a GlobalSight patch that can be installed on top of a current GlobalSight release (the current version is 8.2). The patch adds a new optional attribute type "tfdomain" to the GlobalSight LDAP schema, and a corresponding column TFDOMAIN in the GlobalSight relational database table USER_DEFAULT_ROLES. Requisite changes are made to the GlobalSight front end and middleware to support viewing and editing domain fields on roles.

Below is a sample user role entry from GlobalSight LDAP directory. The activity type determines the type of permission and the value of the tfdomain attribute its domain. Currently, the locale pair is not used to constrain access to ontologies by locale, but it might at some point.

# 57 EditTFS_3 fi_FI fi_FI http://foo/bar KOE, Groups, globalsight.com dn: cn=57 EditTFS_3 fi_FI fi_FI http://foo/bar KOE,ou=Groups,o=globalsight.com roletype: U sourcelocale: fi_FI uniqueMember: uid=KOE,ou=People,o=globalsight.com status: ACTIVE targetlocale: fi_FI objectClass: top objectClass: groupOfUniqueNames objectClass: localizationrole cn: 57 EditTFS_3 fi_FI fi_FI http://foo/bar KOE tfdomain: http://foo/bar activitytype: EditTFS_3

Since TF user management is delegated to GlobalSight, The TF GateService can remain quite simple. All it needs is a stripped-down version of the GlobalSight user management API to check access, read user information and read/write permits from the GlobalSight back end.

GateService implements the setGate and getGate operations. These operations respectively send login parameters and receive an access token, or send an access token and receive user information and read/write permissions.

String site; String user; String pass; String token; String uinfo; String readPermit; String writePermit; String message;

The GateService is currently used by QueryForm and EditForm to login a user, by QueryService to filter query repositories, and by EditService to check write permission on the current active ontology before trying to commit edits. At present, all domains qualify for read permissions, while only domains in roles whose activity type contains the word "edit" (case insensitive) qualify for write permissions.

SparqlService

TF SparqlService is a sparql protocol compliant query endpoint for TF. The w3c sparql protocol recommendation defines the sparql webservice protocol by way of a WSDL 2.0 schema document sparql-protocol.wsdl . The protocol also specifies valid query requests, results and faults (errors) with xml schema protocol-types.xsd . (This schema imports further schemas for its subelements).

The wsdl definition is meant to make possible automatic generation of service and client implementations in different programming languages from the specification. The xml schemas for data formats similarly serve automatic translation of associated data into xml and back for transmission over the web and are used for data validation.

TF SparqlService was generated from the WSDL specification using the Axis2 WSDL2Java code generator as follows.

  1. Downloaded the Axis2 standard distribution (version 1.5.4) that contains the code generator.
  2. Added the following service endpoint inside the protocol-query.wsdl description element. (The code generator throws a null pointer if there are no endpoints in the wsdl.)
    <service name="SparqlService" interface="tns:SparqlQuery"> <endpoint name="SparqlServiceHttpGet" binding="tns:queryHttpGet" address="http://localhost:8080/axis2/services/SparqlService.SparqlServiceHttpEndpoint/"> </endpoint> </service>
  3. Generated service skeleton files using the WSDL2Java utility. (The classpath contains . and the axis2 library jars.)
    . classpath && java org.apache.axis2.wsdl.WSDL2Java -uri protocol-query.wsdl -p com.grapson.tf.ws.service -wv 2 -d jaxbri -ss -sd -ssi
  4. Deleted occurrences of word Skeleton from the generated source and stuck the TF query engine into SparqlService.java. (Needed to write a class Jena2Sparql to convert Jena result set to sparql protocol format.)
  5. Added TF libraries to the generated build.xml and built the service .aar archive with command ant
  6. Deployed the service by copying the build/lib/*.aar file to CATALINA_HOME/webapps/axis2/WEB-INF/services
  7. Checked http://localhost:8080/axis2/services/listServices to make sure the service has been properly deployed.
  8. Tested the service with a query URI, e.g.
    http://localhost:8080/axis2/services/SparqlService/query?query=SELECT+?inst+?class+WHERE+{?inst+rdf:type+?class}&default-graph-uri=http%3A%2F%2Ftfs.cc%2Fowl%2FTFS.owl
EditForm servlet

The TF webservices editing API supports saving a model in the database under a model ID and deleting a model from it by model ID. There are also methods for deleting a model (specified as a URI, a jena model, or the DOM tree of an TF2XML format document) from a given DB model and adding such a model to another DB model.

The least platform dependent way to implement collaborative structured editing (e.g. on a Wiki platform) is to let the platform provide the editor and just read the result as an xml element off the edited page. TF provides a transform back from TF2XML to jena model with a jena model reader XML2TFReader. The TF Edit web service allows editing a persistent model (set of triples) by adding or deleting another model to it.

A terminology query can produce content from several different repositories through relays. Edits should only be allowed on the currently active ontology, which should be one to which the Wiki has edit permissions. Some solution like the Protege facility of choosing the active ontology among the ontologies shown (greying the rest) is indicated. OWL 2 has only just begun to support per-statement source annotations, using a variant of the RDF reification (quad) technique. It takes four more triples to annotate a triple in the ontology. Pending a more efficient technique for source indication (Ontotext has one), it seems easier to do a second query for editable content from the active ontology and use that to single out the editable elements at the edit interface level.

Editing triples about named resources is straightforward enough. Blank triples are more problematic. A brute force way to edit blanks is to rewrite them with universal resource names for the time of the editing. Then they are treated as named resources and cause no problems. After editing, they are written back as blank nodes. This works as long as the blanks all come from the same active model. In general, however, edits may be the result of querying some other dataset. Then there is no way but match blanks across models, that is, do rdf reasoning.

The problem is that there is no direct way to identify blank nodes between models. Blanks represent existential quantifications over nodes. If a user adds a new blank statement, there is no way to prevent duplicates without doing some reasoning. It takes a reasoner to decide whether the active ontology entails a given blank triple in the edits (in the sense of RDF entailment). Moreover, the identification need not be unique, and we cannot tell for sure just what the result of the editing should be. Editing normally happens under syntactic identity, and blanks have not got it under RDF semantics.

At the present time, the editor has a blank node matcher that matches blank nodes in the edited model with blanks in the active model under subgraph isomorphism. The matcher is used in the following ways. First, a connected set of blank triples in the edits is editable just when it is matched with one in the active model. Second, when a model being subtracted from the active model contains a set of blank triples, one matching set of blank triples (if any) in the active model is removed. Third, if a connected set of blank triples in a model being added to the active model is not matched in the active model, those blank triples get added to the active model.

Warning: Some ontology databases may merge a model with its imports. When fetching the active model, one should check against getting version that includes the imports (unless that is what one wants).

The TF jena model XHTML writer can be given as parameter the active model. Given this parameter, the writer marks those triples that come from the active model with attribute class="editable" and those that do not with class="readonly" . An editing tool can then restrict editing to the active ontology, so that a subsequent save of the edits can be included in the right ontology. (Compare the forthcoming HTML 5 contenteditable attribute .)

The Axis2 web api passes all data (request parameters, TF settings, edits, and messages) in an XML bean. With the servlet API, it is less straightforward to return side effects like messages. There is no problem if the request does not return edits, for instance, a save or copy request. In this case, the message is returned as the contents of the response page. If the return page is XHTML, the XHTML writer includes service messages and other metadata in the XHTML header. If the return pages is RDF, messages and other metadata is included in the RDF triples as properties of a blank node of type meta:Entry. Conversions between formats keep the metadata. In editing an active model, the rule is the same as with other edits: metadata specific to the original gets removed, and metadata specific to the edits gets added. Metadata shared by the original and edited entry is left as is. The XHTML writer omits subjects with rdf:type meta:Entry unless meta:Entry is explicitly included in the root filter.

Besides these basic editor capabilities, the TF editing support shall provide a set of widgets that can be used in enriching native X(HT)ML editing in third party platforms. Such widgets can include picklist contents and/or menus for choosing existing resources such as properties and values cached from repository content. To manage large selections of values (typically object property values), an autocomplete facility can be more useful than than menu picklists. In any case, what the back-end edit api shall provide is a method to generate picklist and autocomplete contents from ontologies using appropriate TF queries.

TF web application

The TermFactory web application is a sample client for the TF Axis2 web service engine TFServices. In the demo implementation, it is an Apache Tomcat servlet accessible at address http://localhost:8080/TermFactory/ . The url http://localhost:8080/TermFactory/query starts an interactive form. The same address with an uri or pattern parameter serves ontology requests. In Tomcat, the TermFactory.war archive file gets extracted to the Tomcat WEBAPPS directory at deployment. To hot update the application, one can edit or replace files in this directory.

QueryForm servlet
QueryForm servlet

Big web service queries can be made with the QueryService Axis2 operations setQuery/getQuery .

The TF QueryForm servlet provides a RESTful endpoint for the TF back-end query service.

With parameter form=true , it serves a standalone query form.

TF query URIs can range from simple ontology-fetch or term-describe URIs to URIs containing literal SPARQL query strings. Ontology fetch is implemented with the location mapping retry facility. The retry facility and the associated location mappings allow defining a sequence by which content is queried from alternative data sources: web documents or databases, ontologies or cached term entries. QueryForm HTTP GET returns a query form if it has no query string or the query parameters include form=true, and a result page for the given query string parameters otherwise.

A TF QueryForm url without parameters like http://localhost:8080/TermFactory/query opens QueryForm with default settings. The form can be started with user defined initial settings using a url like http://localhost:8080/TermFactory/query?form=true&... where the subsequent parameters are some of the TF query parameters below.

QueryForm parameters

Following is a listing of the parameters understood by QueryForm. Recognized abbreviations in parentheses. For those parameters that allow multiple occurrences in a query string (currently repos and rewrite), all supplied values will be applied. For the rest, the last occurrence wins. Note that the QueryForm parameter names 'url' and 'uri' are misleading. The ontology document and the repos can be anything resolvable to URL/s by the TF retry facility. The ontology resource/s can be anything resolvable to URI/s by the TF localization facility.

  • Query parameters
    • url an ontology document
    • uri one or more ontology resources
    • queryURL (q) address of a SPARQL query
    • query text of a SPARQL query
    • pattern (p) a regular expression
    • repos (r) list of repositories separated with newlines (percent coded as %0a). Multiple.
    • queryType type of query
      • URL a URL query
      • URI a URI query
      • SELECT a SELECT query
      • CONSTRUCT a CONSTRUCT query
      • DESCRIBE a TF DESCRIBE query
      • ANY (or null, default) any TF query
  • engine (e) TF query engine to use
    • ARQ
    • PELLET
    • Mixed
    • SPARQL
    • Stacked
  • format (f) CONSTRUCT or DESCRIBE format
    • RDF/XML
    • XHTML
    • TF3
  • format2 (f2) SELECT format
    • XML
    • JSON
    • Tabular
  • encoding character encoding
    • UTF-16
    • UTF-8
  • DB settings
    • queryDB query database. Default false.
    • cacheDB save to database. Default false.
    • notry no location mapping. Default false.
  • XHTML settings
    • template (t) output template
    • root entry root (one or more instances or classes)
    • schema (s) ontology schema. Bridges third party vocabulary to TF. Default TFS.owl
    • active editable ontology. No default
    • original deleted ontology. No default
    • locals localization ontology. Default TF10n.owl
    • links link mapping file. Default etc/links.n3
    • lang localization language code. Default en.
  • rewrite (rw) rewrite operation name. Multiple.

The TF query engine and the Axis2 service web API for it has three built in query types, URL, URI, and ANY.

A TF URL query looks for a file of the given name using location mappings to try alternative locations (URLs).

A TF URI query looks for an up-to-date version of an entry for the given resource/s and returns it in the specified format. Each resource is first tried with retry to look for a ready-made entry for the resource. Resources that have not got an entry are described with a DESCRIBE query from the given repositories.

The QueryForm servlet provides a few more queryType options. The SELECT, CONSTRUCT and DESCRIBE queryType options in the query form respectively send a hardcoded sample SELECT or CONSTRUCT query to the query service. The hardcoded queries reside in class QueryPatterns and can be modified by overriding or rewriting that class. It will not be difficult to add additional predefined query types as need arises.

Sample TF query string parameters

Here is a table of sample TF query string parameters .

Query string Explanation
?url=http%3a%2f%2ftfs.cc%2fowl%2fTFS.owl Fetch TFS.owl using local TermFactory service's location mappings
?uri=ont0:ctryCode Fetch or construct an entry for TFS instance uri ont0:ctryCode
?q=http%3a%2f%2ftfs.cc%2fsparql%2fconvert.sparql query using sparql script convert.sparql
?q=DESCRIBE+%3finst+WHERE+%7b+%3finst+rdf%3atype+ont%3aConcept+%7d query using script DESCRIBE ?inst WHERE { ?inst rdf:type ont:Concept }
?engine=SPARQL use SPARQL query engine
?format=XHTML return results in XHTML format
?active=http%3a%2f%2ftfs.cc%2fTFS.owl markup contents of TFS.owl editable in XHTML entry
?original=http%3a%2f%2ftfs.cc%2foldTFS.owl markup contents of oldTFS.owl deleted in XHTML entry
?links=http%3a%2f%2ftfs.cc/etc/links.n3 map resource hyperlinks in XHTML using this location map
?locals=http%3a%2f%2ftfs.cc%2fTFL10n.owl get XHTML entry localisation strings from this URL
?lang=fi localise into Finnish
?rw=relabel use TF rewrite utility to relabel resources in the result model.
?rw=deblank use TF rewrite utility to remove blank nodes from the result model.
?rw=reblank use TF rewrite utility to restore blank nodes to the result model.

Later versions of the service might allow setting further parameters, e.g. the TF DESCRIBE query to use.

QueryForm might write some message into HTTP header.

For another restful terminology query interface, see ISOcat web services . For instance, the URI http://www.isocat.org/rest/user/guest/search?keywords=chinese returns a rated listing of hits related to string Chinese in ISOCat, with links to numbered ISOcat entries like http://www.isocat.org/rest/dc/613 .

EditForm servlet
EditForm servlet

The TF EditForm servlet provides a RESTful endpoint for the functions of the TF back-end query service.

With parameter form=true , it serves a standalone editor.

The edit service can be called through the EditForm with query parameters as follows:

http://localhost:8080/TermFactory/edit?active=test.tf3&original=test2.tf3&edits=test3.tf3

The request does the edit operation which returns the result of editing active model test.tf3 with original test2.tf3 and edits test3.tf3 as a XHTML document. The order of the parameters is arbitrary in this request format.

More complicated edit requests can be made by specifying a stack of desired edit operations with query parameters p and op (currently one of del,add,ed ). The operations are carried out in reverse Polish fashion:

http://localhost:8080/TermFactory/edit?&p=.&p=.&op=del&p=.&op=add

This deletes the model indicated by the second p parameter value from the first and then adds the third one to the result. The edit operations can be interspersed with rewrite operations in the form of rw=<operation> . Each rewrite pops the model on top of the stack and pushes the rewritten model back in its place.

EditForm parameters

Following is a listing of the parameters understood by EditForm.

  • submit type of action
    • Query do a query
    • Save do a save
    • Edit do an edit
    • Upload do an upload
  • form type of response
    • true a form
    • false a page
  • Query parameters
    • queryString ontology document or resource/s according to queryType checkbox
    • queryType type of query
      • URL a URL query
      • URI a URI query
  • format output format
    • SELECT formats
      • XML
      • JSON
      • Tabular
    • CONSTRUCT or DESCRIBE formats
      • RDF/XML
      • XHTML
      • TF3
  • encoding character encoding
    • UTF-16
    • UTF-8
  • DB settings
    • queryDB query database. Default false.
    • cacheDB save to database. Default false.
    • notry no location mapping. Default false.
  • XHTML settings
    • schema ontology schema. Bridges third party vocabulary to TF. Default TFS.owl
    • template output template
    • root entry root filter
    • active name or content of editable ontology. No default
    • original name or content of deleted ontology. No default
    • locals localization ontology. Default TF10n.owl
    • lang localization language code. Default en.
    • links link mapping file. Default etc/links.n3
  • rewrite rewrite option
    • deblank replace blanks with URNs
    • reblank restore URNs as blanks
    • relabel create descriptive labels
  • Edit parameters
    • source source URL or string
    • original original URL or string
    • edits edits URL or string
    • op edit operation
      • del delete
      • add add
      • ed edit
  • fromURL URL to download from
  • toURL URL to upload to
  • message service message

The EditForm doQuery action implements two of the query types defined in QueryForm, viz. query for a document by address, and query for one or more named resources (preferably by address, failing that, by description from given repositories). The doSave action uploads edit area contents (as an xhtml string) to a default location (by default, a webdav directory). With appropriate location mapping settings, a subsequent query for the source of the edits produces the last saved version. (To revert to the original version, query the original location with location mappings turned off.) The doEdit action validates the contents of the edit area (roundtrips the html through rdf). If an active ontology is specified, it updates the active ontology with the edits. If the active ontology comes from a web document, the edited version of the active ontology is saved in the default output format to a default location (by default, a webdav directory.)

Logging

The TF back-end java code uses apache log4j logging. Some third party code uses java logging. To change the logging settings of command line scripts, edit property files etc/log4j.properties and etc/logging.properties , respectively, for log4j and java logging. To change a webapp's logging behavior under Tomcat, edit log4j.properties and logging.properties files in $CATALINA_HOME/webapps/TermFactory/WEB_INF/classes . The default tomcat logfile is $CATALINA_HOME/logs/catalina.out . For the Axis2 logging, use ${CATALINA_HOME}/webapps/axis2/WEB-INF/classes/commons-logging.properties to select no logging, simple logging or log4j. Then edit ${CATALINA_HOME}/webapps/axis2/WEB-INF/classes/logging.properties or ${CATALINA_HOME}/webapps/axis2/WEB-INF/classes/log4j.properties depending on logger. There may be problems with server logging having to do with how java classes get loaded under axis, see http://www.mail-archive.com/axis-user@ws.apache.org/msg34318.html.

User interfaces

This section describes the TF front end user interfaces.

The mockup

At the outset of the TermFactory project, the plan was to develop specific dedicated platforms for TermFactory for querying, browsing, editing and discussing terms. There were a lot of promising platforms to borrow from, and many of them were surveyed for adaptation to TermFactory. With time, it became apparent that such platforms are a rapidly moving target. The very existence of many popular platforms competing for clientele made it less enticing to have to get trapped in any one of them. TermFactory is too generic a concept to be imprisoned in some particular user interface. Now, the focus is to provide plugins and mashups that can be embedded in a variety of present or future platforms.

At this time (Jan 2012), it seems what little there was by way of a will for common standards (thanks to the common enemy Windows) is wilting away. It is a free-for-all, a new dogfight where the big players are scrambling for position in the brave new world of handheld devices . It is to be hoped that no one wins. Meanwhile, it seems best not to stake much on the user end, since that is going to change very fast.

The initial concept of at least four main use scenarios: Query, Browse, Edit, and Discuss, is still there. Therefore it still makes sense to start discussion with the original mockup.

Front end mockup

An early mockup of TermFactory specific web front end interfaces is shown below.

The current notion is that TF provides plugins and services for third party platforms.

MediaWiki and Drupal are being used as testbeds.

Tämmöistä omaa käyttöliittymää oli alun perin suunniteltu.

Nykyinen TF tarjoaa välineitä tällaisten käyttöliittymien toteutukseen eri alustoilla.

Show/hide TF mockup

TF mockup

TF-mockup2.pdf is the same mockup in pdf format. Details of the screens are shown below.

Show/hide TF mockup detail

In addition to the four views shown here, there may be simplified layouts for simple term search and entry, as well as text oriented TFPedia content, following the spirit of Wikipedia. Entry to TermFactory web can be controlled by registration and login, as usual. Different user categories should be distinguished with different credentials for seeing or modifying content.

In retrospect, the four functions: browse, view detail, edit, discuss, stay with us. The main difference to the plan is that the association of tasks and views is much less cut and dried now, thanks to connecting and mashing up third party tools and platforms all served by the TermFactory back end. Many tools and platforms provide more functions than one. Some functions are served by many platforms, with often useful variations. This plurality is part of the web experience.

TermFactory Search

TermFactory Search shows a term browser on the left half of the screen. The fields shown on the screen can be modified by the user and the sort order of the results are user modifiable much in the same way as in email programs. The right half of the screen has a fill-out search form and a SPARQL query editor, both for searching TF repositories. The results of a query ars shown in the browser. Naming and saving search filters and SPARQL queries by user profile should be added (not yet shown here).

TermFactory View

By clicking entries shown in the browser, the user can view them in more detail in TermFactory View. The left hand side of the screen arranges term information in a way standard in terminographical entry layout. The Term section shows the different (cross-language) synonyms for a given concept. The Concept section shows a fragment of a concept's neighborhood in the ontology. The Description section shows textual fields related to the concept. Naming and saving views by user profile should be added (not yet shown here).

At the margins of the entry section, there are the vote and comment forms. A user can give immediate feedback about items shown by voting them up or down at this stage. She can also make comments or suggestions through the Suggest/Comment form on bottom of page. These comments and suggestions will be stored in her local TF forum and spread to the other sites through its RSS feed.

On the right half of the screen, headings of discussion items fetched from TermFactory sites related to the chosen item(s) are shown. These items are fetched along with the terminology content proper in connection with a repository query (filtering through TF site RSS 1.0 streams or in other ways). The query language allows the user to control this part of the search as well.

TermFactory Forum

If the user wants to check the discussion around an item, she can move to TF Forum. The layout of TF Forum follows usual news forum layout. The main novelty is that threads are associated to TF resources (URLs or URNs). Users can also vote about other discussion contributions. As usual in user moderated forums, such votes can then accrue to the account of the authors of the contributions. The "karma" thus accruing to authors of successful contributions can be exploited e.g. to sort messages or to evaluate new suggestions), to help site moderators, even to automate some aspects of the certification process. Naming and saving forum settings by user profile should be added (not yet shown here).

Again we do not want to get stuck with any specific forum tool. Some recent commenting tools like Discqus allow inserting comment elements mashup style on various platforms and support relaying comments between platforms through a common store using RSS. This fits well the TF distributed design.

In sum, the solution should provide the following.

  1. forum contribution format(s) that allow(s)
    • referencing forum contributions by contributor and TF resources commented on
    • entering explicit proposals in contributions in a format that is read-write consistent with other TF formats
    • including and extracting resource references and/or entry snippets on RSS messages
    • entering votes associated to proposals on contributions
  2. forum implementation(s) that offer(s)
    • Solution for storing and rerieving stored forum contributions (at least) by contributor and TF resource reference
    • Solution for collecting votes from forum contributions associated to contributor and TF resource reference
    • Methods to include forum links and mashups on other TF platforms

Term suggestions, discussions and votes are managed by the external platform(s) and not by the TF repo(s). If discussions and votes are accessible to TF at external sites in RDF form (say, through RSS 1.0 feeds), then these sites can be included in a TF query, and TF serves them together with queried TF contents.

TermFactory Wiki

In a professional terminology use scenario, TF repository contents are updated by a selected set of experts on the basis of community input as a separate stage of the workflow. That is, suggestions, discussions or votes are not stored on TF repos nor do they affect other TF content directly. Some TF updates might be made automatic on the basis of community polls, but that requires coding a separate workflow mechanism; it is not built in.

More precisely, free form discussion of terms and verbal descriptions for human use not leaving the Wiki platform can be free for all. But structured editing of TF entries and acceptance for inclusion to a repository is to happen by qualified people only. Unlike the discussion and general information sections, and forum discussions, such sections of such a TF Wiki are not open for all to write. Once a steady state or agreement has been reached by the Wiki participants, the result of the edits should be saved on a repository. Besides free form text, the results of the editing should thus be eventually brought to a form which conforms to TF ontology format. A term entry is fetched from a database repository. (For now, at least, more global editing of ontologies is supposed to happen offline.) The Wiki page can use TF Retry to fetch a term for editing. The entry is rendered on the wiki page in some editable form. Changes to the entry are saved back to the repository.

TermFactory Wiki is a collaborative editing platform. Like Wikipedia, it has Edit and Discuss tabs for changing object content and meta discussion about the content, respectively. Like Collaborative Protege, it may have a Chat section for real time chat between editors. Editing TermFactory entries on a Wiki is not open for all, but only for an accredited team of editors. (It can be a big and distributed team, nevertheless.) TermFactory Wiki does not manage edit locks but trusts simultaneous users to manage commits between them. The chat keeps a list of currently logged-in users and their activity. Contributions of non-editing users start as suggestions in the forum and follow the longer community accreditation (discussion and voting) process. Some aspects of this process can be automated, for instance, promoting translation suggestions to a repository either after they have a sufficient high vote/status (an editor or other user with sufficient authority may be needed cast a decisive vote). The discussion tab can be less restricted. A channel exchanging information between TF Forum and TF Wiki discussion page must be provided. Naming and saving settings by user profile should be added (not yet shown here).

In sum, to edit TermFactory entries on a wiki platform for submission to a repository, we may want all or part of the following.

  1. layout orientation and rendering
    • X(HT)ML schemas for alternative entry layouts
      • concept oriented entries
      • term oriented entries
      • lemma oriented entries
    • A library of java/javascript methods that implement orientation
      • Java ontology writers and readers
      • javascript DOM transformers (if needed)
    • Stylesheets for rendering the same
      • CSS
      • XSLT/FO (if needed)
  2. resource-guided editing:
    • A library of queries for collecting resource listings from the repository network:
      • input completion (for object properties - instances of property range class)
      • menu picklists (for data properties)
    • A library of javascript methods that implement the facilities:
      • input completion (for object properties - instances of property range class)
      • menu picklists (for data properties)

TF Wikis may be connected to the repository back end through a structured entry editor, shown as an editable area on the Wiki page.

TF Wikis may be connected to TF forums through a comment collection mashup component like Discqus .

The actual front end

This section covers the actual TermFactory front end toolkit. A noteworthy feature of the TF approach that we try to get by with as few TermFactory specific adaptations as possible, so that the tools remain usable for their original purposes as well. Here are a few cases in point:

The TF editor is implemented as plugin on the open-source javascript wysiwyg editor CKEditor .

  • TF XHTML entries are a special case of XHTML. CKEditor is a generic HTML editor. Thus i can be used to edit any HTML content besides TF entries.
  • TF entries are a special case of RDF/OWL models. The back end is able to convert any RDF/OWL model into XHTML. Thus the TF Editor can be used as a general purpose RDF/OWL ontology editor.
  • The TF default term entry template is a special case of entry template. The editor can be adapted to other formats, like the WordNet lexicon format, by changing editor and back end parameters.

Login

TermFactory user management is described in the section on the TF gate service. A front end to the gate service is the GateForm servlet. TermFactory QueryForm and EditForm forward to the GateForm servlet to check user credentials. GateForm then makes a new request to the original form with the user's credentials. GateForm can be run alone with the webapp URL TermFactory/login. Currently, the GateForm form just checks the login and shows some user information.

Search

QueryForm
Query form

With parameter form=true , The TF QueryForm servlet provides a standalone query form. Option lang=fi produces a Finnish version. The TF query form exemplifies the use of the TF QueryService in a web application. Kaavake QueryForm havainnollistaa TF kyselypalvelun toiminnallisuuksia.

A number of likely query types have been built into the query form from among the more or less boundless variety expressible in the SPARQL query language and supported by the TF query engine.

The fetch document/s query type fetches one or more documents identified by URL or by a location mapped TF address using the TF retry facility. With the no routing option, TF location mappings are not used. TF built in naming conventions, including TF built-in pseudo schemes (tdb+ and rdb+) and relative file url resolution with respect to TF_HOME are still applied. The no routing option applies to all document retrievals during a query execution, including repository locations and XHTML parameters.

The default assumption is that the documents fetched by a URL query are RDF documents in one or another of the formats understood by TF. If the fetched document contains an ontology that imports other ontologies, then they are included in the query result. The output format of combined graph can be selected from the graph format radio buttons. If the requested format is different from the source, the source is read into a model and rewritten in the requested format.

With the no imports option, ontology imports are not read and included in the result. If, moreover, the requested format matches the original format (or format any is checked), the document is downloaded as is. An individual document (rdf or not) can fetched and downloaded as is with URL query settings no imports and format any. Alternatively, the download address and submit button can be used for the same purpose.

The accumulate checkbox joins the contents of the document at the download address to the result of the query. Together with the to file option, it allows collecting a complex query result incrementally with a sequence of simpler queries.

If to file is not checked, QueryForm tries to return the query result in a response page while you wait. If the query takes too long, check the to file checkbox. Then the form returns immediately with the user selected or a machine generated model ID. The query results will be saved to the displayed download address in due time. They be can be fetched from the download address with the download button when the query is complete.

The fetch or describe resource/s query option is a mix of the fetch document and a TF DESCRIBE query. The input line can contain for one or more whitespace separated TF resource names (URIs, prefixed names defined whose prefix is defined in the repositories, or nicknames defined in location mappings). Each such name is first looked up as an address for a document as in the fetch document query type. This allows precomputing the results of a common or complex describe query for X into a canned entry X. A DESCRIBE query is carried out for those resources that had no associated document. (A fresh DESCRIBE query for a list of resources X can be requested by entering DESCRIBE X in the query textarea.)

The next three radio buttons on the left side of the form search and describe terms by baseform string pattern using the SPARQL regular expression filter.

The right side of the form runs user defined SPARQL queries. A query file can be specified by name or the query text can be just typed or copypasted in the text area.

The Query button executes the chosen query with the settings shown at the bottom of the form. The Clear button removes messages from the page.

Save settings let the user specify a descriptive name for the query result. The place where the results are saved appear in the download address. If no modelID is specified, TF creates a long random name (a UUID).

The Download button downloads the document specified in the download address.

The Edit button copies the download address to EditForm and opens EditForm. The EditForm does not fetch the document automatically, in case some options should be set before downloading the document in the editor.

The Copy button copies the contents of the download address to the upload address. (The upload address must be a writable TF address.)

Edit

EditForm
Editor

With parameter form=true , The TF EditForm serves a standalone editor. Option lang=fi produces a Finnish version.

Editor live Avaa muokkain

Show/hide editor screenshot

screenshot of editor Kuva muokkaimesta

Editing options

There are a variety of ways of editing TF content, depending on which facilities and options are used. The following table singles out some of them.

TF edit styles
action editor query edit cache active schema template
Discuss and comment
Edit entries locally yes
Edit web source yes yes
Validate edits yes yes yes
Save and share source in TF database yes yes yes
Save and share edits in TF database yes yes yes
Edit an ontology entry yes yes yes yes
Edit third party format as TF yes yes
Edit 3rd party ontology entry yes yes yes yes yes yes yes
Edit any html content yes yes yes yes

The edit form lets the user fetch a prefabricated TF document to a textarea editor (the URL button), or make a TF describe query for a resource URI (the URI button). The results of the query are shown in the editor in TF xhtml format. If the url of an active model url was given, resources and statements found in the active model are boldfaced. If localisation language (and ontology) if given, resource labels are localised according to them.

The user can then edit the XHTML document using the editor. On Save, the editing changes are interpreted against the active model and an edited version of the active model saved in the TF database cache. The database name (modelID) of the saved version is shown on the page. On Upload, the saved version is uploaded to the given url on a TF webserver.

Make note that it is an edited copy of the active model, not the editor contents, that gets saved. The point is that the contents in the editor may come from many ontologies, as the result of a multi-repository or multi-site query. The extra information in the query result may be needed to decide how to edit the active content, but only the currently active ontology is subject to change. Also, separating the edits from the active model makes it possible to edit selected parts of a large ontology without dragging all of it to the editor.

As a special case of the above, consider the case when the edits and the active model coincide. This happens, in particular, in editing an existing entry (an XHTML format document). In this particular case, if the active model is not set, the servlet sets it is equal to the source. On Save, what is saved is a new version of the edited entry. In this particular case, too, the servlet tries to update the edit area with the saved active model. (Normally, when the active model is different from the edited contents, a Save does not update the edit area.)

Edit form

The edit form contains, besides the term edits textarea housing the CKEditor, a menu of settings. This part of the form pops up from the Set button on the CKEditor toolbar.

Show/hide edit settings popup menu

screenshot of editor popup menu

Although the settings menu appears to pop up "from inside" the textarea editor, it actually is a part of the the same form as the termarea editor. Its looks are determined by the edit form's stylesheet editform.css . The controls of the menu are described below.

no routing Check this to disable TF location mapping
search in DB Check this to include database in search
save in DB Check this to save edits in database
source URI/s of the resource/s to edit
template
schema
active
locals
language
hyperlinks
document Push this button to fetch a document (URL query)
resource Push this button to generate an entry for a resource (URI query)
blanks Use this select to rewrite blanks
upload location
model ID
Cancel

The database provides a persistent cache to hold edits between edit sessions. When 'save in DB' are checked, the results of edit actions are saved in the default TF database. When 'search in DB' is checked, and location mappings are set up to look first for a database version of a resource, the next query for the resource gets the version last saved in the database. This lets the editor stop editing on a version of an entry, go away to do other things, and return to the version he worked on last. Or it can be a community of editors sharing a version. Then there are the usual difficulties about document sharing here that have to be solved by some locking or conflict resolution mechanism. If the editor sits on some collaborative platform like Mediawiki, shared versions can be saved on the platform and let the platform take care of turn taking.

The subsequent options from template to hyperlinks instruct the XHTML writer . They are usually defaulted so that one need not set them. The defaults are shown in the menu. Unless set by user, they are fetched from the entry's meta triples or html header.

The radio button provides the first two options of the TF query form. The document button does a URL query, i.e. uses Retry to fetch a document by name. (If TF routing is used, the name can be anything defined in the location mappings. If not, it should be an accessible URL.) The resource button calls for a DESCRIBE query for the resources whose names are listed in the source. The editor Query button currently only provides these two query types. When a more complex query is called for, use the query form and save the results somewhere to fetch into the editor.

The Edit button sends the contents of the editor to the edit servlet that relays it to the EditService back end for validation. If the edits are well-formed, the edit service rewrites the XHTML (to show what the edits actually amount to) and returns them to the editor. If the database save checkbox is checked, the edits are also saved in the database. Further, if an active ontology was specified, the edit service also updates the active ontology with the edits and saves an updated version of the active ontology in the database. The safe strategy is first uncheck the save checkbox and use Edit just to validate the edits. If they look right, then check the save box and click Edit another time. When returning to editing after a while, if there is a risk that the edits in the editor have gone stale, get the newest version from the database by checking 'search in DB'. (This assumes location mappings are turned on and configured to check the default database first.)

The blanks select menu gives choices for rewriting blank nodes in an entry. Option deblank replaces blank nodes with blank resource names of form urn:blank:... . This gives the blanks a persistent if precarious identity that remains between rewrites. By the RDF standard , RDF reads/writes do not preserve blank IDs . Option reblank rewrites blank URNs back to normal blanks. Option relabel tries to use the TF rewrite utility to invent descriptive labels to terms and expressions according to the TF descriptive label naming convention .

The Upload button uses IndexService to upload the edits to the given upload location (a TermFactory web directory URI). If there are no edits but a modelID, the model at the given modelID is uploaded.

CKEditor

TermFactory extends CKEditor with a TF specific plugin for CKEditor.

It adds to the CKEditor toolbar some extra buttons and a menu.
Set opens the popup menu for editor settings
Query shortcut to the Query submit button on the settings menu.
Edit shortcut to the Edit submit button on the settings menu.
Upload shortcut to the Upload submit button on the settings menu.
Menu opens the TermFactory menu.

The toolbar shortcut buttons apply the settings in force in the edit form. They just save the trouble of opening the popup when there is no need to change the settings. The TF Insert menu is also on the CKEditor context menu (right button of mouse).

Since the TF editor is a fully featured HTML editor, it is always possible for a user to enter content anywhere in an entry by just typing it there, or by cutting and pasting content anywhere and then editing it. Once one is at home with some terminology and entry format, that may well be the most convenient way of working. However, should one forget what can go where, or just want to avoid typos, there is the TF Insert menu.

Show/hide TF Insert menu

CKEditor TF Insert menu

The purpose of the TF Insert menu is to help users choose properties and values suited for each type of resource. The menu has (currently) four tabs. The menu is opened at a tab that matches the current selection in the editor text area: term for terms (signs) and definitions (messages), exp for expressions and texts (forms), ont for concepts (meanings), and any for unclassified properties. Each tab has (currently) just two text inputs, one for property and the other for its value. The four tabs differ only in how the input fields are autocompleted. A click on a property input tells TF Insert menu to look for a property or value among those available for the tab. Another click opens a selection of subproperties of the property. Clicking on the selection repeats the process, until the desired property appears.

A click on a value input makes TF Insert menu to look for classes in the range of the currently selected property. (If the selected property is a datatype property, TF Insert menu autocompletes with literal values for the property. If there is no input property, search starts with the top class for the tab.) Another click opens a menu of subclasses of the value to choose from. Clicking on the selected value starts the subclassing process again with the selection. If the selected class has no subclasses, TF Insert menu looks for its instances.

The TF Insert menu has one submit button called Insert and two exit buttons, Cancel and OK. The Insert button uses the input value to locate a template from a predefined collection of input templates. An input template is a predefined bit of XHTML associated to the currently selected value. The Insert button tells TF Insert menu to insert the template as the value of the input property in the property list containing the current selection in the text area. If no template exists for the input value, the input value as such gets inserted. Insertion is undoable (select Undo from CKEditor toolbar or hit Ctrl-Z).

The OK button closes the menu but remembers the current inputs for the next time the menu is opened. It also creates a shortcut to the current inputs in the input autocomplete list. Such shortcuts are of form property=value . Selecting a shortcut restores its property and value as the current inputs. Once the inputs are restored, one can use them to do another autocomplete or insert.

The Cancel button closes the menu and clears the tab and the shortcuts. The next time, the canceled tab opens with its startup values.

TF autocomplete lists

The autocomplete lists for TF Insert menus are stored in directory fe/TermFactory/data/ .

instances.json autocomplete list from classes to their instances
literals.json autocomplete list from datatype properties to their literal values
ranges.json autocomplete list from properties to their range classes
subclasses.json autocomplete list from classes to their subclasses
subprops.json autocomplete list from properties to their subproperties
subprops-top.json autocomplete list from tab top properties to their subproperties

The autocomplete list subprops-top.json is hand made and contains the top properties of the tabs. Autocomplete lists can be generated from any ontology using the sparql queries in directory io/sparql/ . An example command line: pellet4tf query -e SPARQL -q $TF_HOME/io/sparql/subclasses.sparql -o JSON $TF_HOME/owl/TFS.owl > subclasses-sparql.json . But note that autocomplete lists need not match any ontology hierarchy, they can be customised to do whatever seems practical. The customisation can happen at any level: custom autocompletion ontologies, custom autocompletion queries, or just custom autocompletion lists.

instances.sparql literals.sparql ranges.sparql subclasses.sparql subprops.sparql
TF input templates

TF input templates are stored in directory fe/TermFactory/data/templates . A template can be defined in any TF compatible format. Int can be written by hand or copied from a relevant ontology schema. A template definition can be compiled into XHTML using the rewrite facility . The (current) convention is that a template for a given resource is named with a prefix name for the resource, so that e.g. the template to load for http://tfs.cc/term/Term is called term:Term.xhtml .

Show/hide input template definition in Turtle

Show/hide input template in XHTML

TF front end localization

The TF dialog front end, including the menu, can be localized to the current interface language. Both the interface texts and the autocomplete options get localized. The dialog localization happens by reference to the json file named in javascript variable TFConfig.data.locals . (By default, it is fe/TermFactory/data/locals2.json .)

The TF json localization file format that gets generated with the pellet4tf query script in io/script/localize can be used as such to localize the TF CKEditor dialogues. But the format can be minimised further using the javascript localize.js residing in the front end directory fe/TermFactory/js/ . The json generated with localize.js not so much optimized for size as for quick lookup of terms in the editor. The CKEditor javascript front end understands both formats. In Ubuntu, the localize.js script can be used from command line as indicated by the header of the script:

/* localize.js 30.11.11 * convert sparql output locals.json to localisation map locals2.json * Usage: in fe/TermFactory/data/ do * js -f ../js/localize.js locals.json > locals2.json */

Here is a sample sequence of command lines to produce a menu localization file. The first line creates a json localization file from TFS.owl. The second line minimizes it. jslint.js is a just a xhelp script to check the localization file syntax. It complains if something went syntactically wrong in the process.

lcarlson@tf-exia:~/Data/CF/TF/fe/TermFactory/data/ localize > locals.json js ../js/localize.js locals.json > locals2.json js ../js/jslint.js locals2.json

Warning: TF dialog interface localization strings are also in locals2.json, When changing the dialog localization files, make sure the localization source contains translations for the interface strings. They interface concepts are listed in the TF localization ontology owl/TFL10n.owl .

The CKEditor plugin is in fe/TermFactory/js/term.js . The TF Insert menu is defined in fe/TermFactory/js/dialog.js .

There is a tutorial on writing CKEditor plugins in Woofie .

In ckeditor (v. 3.5.1), css color and text-decoration settings for html anchor elements are declared !important (in ckeditor plugin about.js ), with the effect that anchor elements in user content always show with default formatting (blue underlined). A workaround is to supply user-defined anchor settings with css !important declaration as well. See etc/tf2xhtml.css for an example.

It is simplest to use absolute uris to link style files to content edited in the ckeditor. In the ckeditor instance bundled in TermFactory webapp, relative uris resolve to the webapp's root directory ($CATALINA_HOME/webapps/TermFactory/).

CKEditor and its predecessor FCKEditor have a built in template facility that can also be used for TF purposes.

Wiki

A TermFactory Wiki has been implemented on MediaWiki as a Special page run by a MediaWiki extension that embeds the CKEditor javascript textarea editor. MediaWiki hosts the QueryForm servlet in an inline frame on the page. The query result shown in the frame is transferred on the editor text area, where it can be worked on using a TF extension of the CKEditor javascript editor. The edits get saved on the MediaWiki page (database) between editing sessions. The wikipage version of an entry becomes XHTML insert on the page. It can be edited as is on the MediaWiki page, or copy-pasted to another XHTML editor. The edits can be cached to the repository database with the term editor page Save button. The Save button also saves the changes to the active ontology in the database. An ontology can be published on a site's ontology collection in a WebDAV file server using the Upload button.

Comment

In forums, too, we want to keep a distance from specific implementations. One message standard in this area is RSS , which supports internet messaging between forums, blogs and other similar discussion platforms. More recently, json (jsonp) has gained popularity. It seems that the RDF based RSS 1.0 specification never caught on, and it has been superseded in practice with the XML based RSS 2.0 specification.

Disqus

So far, there is no forum implementation for TermFactory. The TermFactory Mediawiki platform is set up to generate Disqus comments threaded by the TF resource uri given in the page page title. In disqus terms, the TF resource uri is used as the disqus identifier instead of hosting page url. In this way, comments on the same resource coming from different platforms get collected together and can be reviewed as a group.

TF system design

This figure shows the component types of a TermFactory system graph. Each type of component is exemplified with a concrete instance (one actually used in the reference implementation).

Show/hide TF system

TF system

Workflows

This section documents workflows and best practices in TF based terminology work. There is a general section on division of labor and the terminology worklow, and a section on how the work happens on different platforms. The latter is further divided to sections on professional terminology tools, The TF wiki platform, and TF panel inserts on other platforms

The increase in user created content and interactivity gives rise to issues of control over the community and ownership of the jointly-created content. This gives rise to fundamental legal issues, such as Intellectual Property and property ownership rights.

Collaborative terminology work vs. traditional terminology

Since there is no regimented methodology for collaborative terminology work yet, there are both possibilities and risks. The following table compares traditional terminology work per work phase to the TF workflow. TF methods do not oust traditional ones, but complement them

Phase Traditional TermFactory
Source collection Books, journals Community awareness, web harvesting
Term candidate collection Perusal of documents Community awareness, web content statistics
Term choice Committee Community voting
Concept analysis Drawing Ontology editing
Term description Dictionaries and grammars Expression ontology
Compilation Text editing Query language
Publishing Publishing house Transformation pipeline

A virtual expert community is a natural accumulation point for links to relevant documentary sources. The TermFactory user base keeps a steady flow of term proposals just by failing to find terms in the repository. Community portals can also be actively harvested for terms not covered by the repository using ContentFactory information retrieval, fact extraction and term / keyword extraction tools. The user base's preferences for term usage can be monitored in the community using original or active voting schemes. Such statistics can be used as (descriptive terminology) is or to support authoritative decisions in grading or (de)selecting term candidates (normative terminology, harmonisation). Term choices can also be evaluated against the expression ontology of the complete TF repository system so as to avoid inter-sector term clashes, accidental homonymy, and to enhance terminological consistency. The primary source for TF terminology is the repository system (or a subset of authoritative servers). When separate compilations are needed for special purposes, for publication on some restricted channel or the like, the desired subset can be retrieved from the system using suitable database queries. The results of the queries, being in a structured standard form, can be rendered in desired formats according to publication channel using fully automatic transformation pipelines.

Another table of comparisons can be made according to the roles and responsibilities of the actors in the traditional versus TF terminology workflow. Here, there are many opportunities but also potential risks and losses, not the least for those whose roles are susceptible to change in the process.

Dimension Traditional TermFactory
Actors Terminologist(s), subject expert(s), term committee Terminologist(s), subject expert(s), users
Workgroup methods Person-to-person interviews, committee meetings Web community
Schedule and motivation Fixed schedule for a fee Shared interest
Roles and authority Fixed roles and authority Based on shown merit
Languages Few with a fixed definition language Many languages
Sources Predetermined Wide variety
Purpose Predetermined Not restricted
Lifecycle Project duration Continuous

In the traditional workflow, terminologies are made to order by professional terminologists working either in-house (relatively few organizations can afford to have in-house terminologists) or as a paid service by terminology organizations or language services. As a separate process, terminology is not a very profitable business, owing to the high manpower cost of quality terminology and the relatively low priority given to terminoloy by clients. Worldwide, there are not many companies or organisations solely specialising in terminology work; many of the existing ones are small and/or get public funding. The profits from high quality terminology for a company or organisation whose main business is elsewhere are long term and indirect, and usually relegated under some low profile budget item. It makes more sense to integrate terminology with some more pressing mainstream business item or service. Thus larger language services can do terminology as a part of a wider language service offering, like language training, translation, multilingual content management, or the like.

Part of the high cost of terms comes from expensive expert time. Expert committees are ill attended and tend to converge slowly because terminology disagreements often hide territorial conflicts. While no technology can remove these problems, collaborative terminology platforms can help reduce these bottlenecks by allowing a wider range of opinions and more freedom from calendar conflicts. The fear might be raised that web communities are too amorphous and unruly to contribute reliable input. This criticism uses the free-for-all web communities like Wikipedia as the thought model. Even Wikipedia is surprisingly good at correcting itself, but there is no need to think that TF expert communities are anonymous or free for all. Membership of a given community may well be restricted and the contributors identified. Technology is not the limiting factor here.

An interesting possibility is to use experience-based attribution of authority by the community, a method that works well in many web-based expert communities. An expert whose advice has been useful for a large number of users gains authority points (counting thanks from users, or some other measurable proof of quality). The community can use this information to sort or weigh alternative opinions on a given topic. Although one can imagine many ways how this idea might not work, perhaps surprisingly, it practice it does.

Another possible criticism is that the expert community can only have public discussions. Nothing prevents one-to-one or other private discussion en petit comité among members of on an expert forum, either on the forum or separately, whether real time chat, forum or email style.

It is important to separate technology from its application. A case in point is the question of motivation. Professional terminology work is done for a fee, while one of the seductions of the idea of collaborative terminology work is that it might happen for a shared interest, on a tit-for-tat basis - or for indirect profit or gratification, like the visibility and authority gained from contributing to communities like Wikipedia or LinkedIn. But all this is again independent of the technology. It is quite as possible to build a TermFactory installation working on a fee-for-service basis. For information providers (terminologists, experts) the payment is based on content rendered and approved by the buyer(s). For information consumers (terminology users), the payment could be on a subscription basis, or measured by term downloads. Different user roles, rights and obligations can be defined as is done in many existing collaborative work platforms.

Professional terminologists are trained to follow standard work methods and quality norms in their work. If the work is spread on a larger number of subject experts and users, quality may deteriorate. True; this is one of the areas where TF needs to come up with innovative solutions. Roughly, we need a regimented upward flow in a TF repository system, rather reminiscent of the Wikipedia echelons of quality checking. When term suggestions come in, they are made available to the communmity with a low reliability status. When they have been revised by the commumity and perhaps passed inspection by a select group of experts, they get a higher status. At the end, they may be adopted by the repository authority as part of the "normative" core of the collection. (This process is one where the users too may get promoted to a higher status.)

One crucial quality requirement for professional terminology is source indication. This plays an important role in the subsequent authorisation of terminology. Here, information technology can help a lot. First, the platform may make it easy for users to to add source indications in a way that minimally is traceable to the source and at best follows a given norm (say, using web addresses and/or one of the many bibliography formats in use). Second, terminology suggestions can be evaluated by the reliability and authority of the source and/or the proposer(s).

TFS terminology repositories as a rule are an ongoing concern. This is a major improvement to the status quo, where typically a terminology project is started when earlier terminology collections are hopelessly obsolete, and by the time the new collection is out it is already obsolescent itself. A TF repository is open 24/7, so it can catch the newest fads. On the other hand, the repository system is under revision control, so that earlier editions can be kept on the accessible, or an authorised edition alongside the nightly release. It is also possible to open and close special purpose scheduled terminology projects using the repository system as a base of data. There is no need to throw away anything that works, just because there are more alternatives available.

Workflow

User roles in the TF workflow

In a professional terminology use scenario, participants in the terminology workflow can assume different roles.

  1. General users search the terms (Stream 1, Query) and comment on them (Stream 2, Comment).
  2. Moderators check the comments (Stream 2) and make new term modification proposals (Stream 3, Wiki).
  3. Advanced terminology/linguistic professionals review the proposals (Stream 3), then make changes to repositories (Stream 4, RDF editor).

Though contents produced in Stream 4 will finally be merged into Stream 1 in the end, all of them will stay separate while still in process. The higher numbered streams feature higher data quality and more restricted access rights. These streams of data are in principle platform neutral, so that future users can set up their favored toolkits to consume and also contribute to the professional data. In practice, platforms are different, since lower streams are textual, while the top level consists ontologies. Who is allowed to bear which roles is left open: in some scenarios, everyone can be a moderator, in others there can be some pecking order.

Termitehtaan toimijat

Termitehtaan toimijoilla on eri rooleja.

  1. Tavalliset käyttäjät hakevat termejä (Stream 1, Query) ja kommentoivat niitä (Stream 2, Comment).
  2. Moderaattorit seuraavat keskusteluja (Stream 2) ja editoivat termejä (Stream 3, Wiki).
  3. Terminologit tarkastavat muutokset (Stream 3) ja tekevät versionmuutokset (Stream 4, Wiki).

TermFactory roles around MediaWiki

Here we consider the division of labor in a Wiki based TermFactory community and the associated user roles.

Approved TermFactory specific descriptive information about some resource, say 'cat' meant for human consumption can be entered on a wiki page associated to the resource. The content of the page should agree with the TermFactory entry about the this sense of the word 'cat'. If it does not, one or the other needs updating.

Also on the page is the snapshot of a TermFactory entry for this term resource that goes with this version of the page. It is easy to browse to an earlier version of the enty and the accompanying text using MediaWiki page history.

One plausible division of labor is the following.

  • Unregisterd users can browse the open collection.
  • Unregisterd users can discuss terms using Disqus commenting system,
  • Unregistered users can discuss MediaWiki pages trough the MediaWiki Discussion tab.
  • A wiki page in an open collection can be edited by self-registered (i.e. identifiable) users.
  • A wiki page in a closed collection can be edited by other-registered (i.e. approved) users.
  • The TF entry associated to a resource can be edited by approved users.
  • An ontology can be edited with a TF entry by authoritative users (owners of a collection).

Each echelon can be further divided by category to specialists of each category.

Ontology work

This section documents one aspect of ontology based professional terminology work. viz. mixing and matching of term collections using OWL editors, reasoners, and converters. Using ontology tools can save manual labor in matching and merging data from differerent sources. Existing owl reasoners and query engines are generic tools whose application to terminological ontologies requires a high level of ontology expertise and remains semiautomatic at best. The aim of the TF specific ontology tools is to telescope the workflow so as to minimise expert intervention.

Extracting

One of the basic tasks in using TF is extracting some desired subset of data from one or more repositories as dataset. This section surveys different ways of going about this task.

Extracting subsets using Jena and Pellet

To cut out a desired subset of an ontology, one can use the Jena RDF query engine ARQ and the Clark&Parsia SPARQL-DL query engine Pellet . The ARQ rdf query engine implements all of RDF SPARQL, including the SPARQL language DESCRIBE query and the FILTER function REGEX. On the other hand, since it is a RDF tool, it does not do OWL reasoning or understand OWL imports. Pellet ARQ engine queries individual triples with Pellet and does the rest with ARQ. Pellet Mixed engine uses Pellet for basic graph pattern queries but uses ARQ to apply sparql UNION or FILTER constructs on the results. Running Jena ARQ command line tool sparql with command line

sparql --query=tfquery2d.sparql --data=../../owl/TFS.owl

where tfquery2d.sparql is following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX term: <http://tfs.cc/term/> DESCRIBE ?inst WHERE { ?inst rdf:type term:Term . }

prints out an ontology describing instances of class term:Term in terms of the statements asserted about them in TFS.owl (in Turtle rdf format).

Show/hide query results
@prefix : <http://tfs.cc/owl/TFS.owl#> . @prefix exp: <http://tfs.cc/exp/> . @prefix meta1: <http://tfs.cc/meta1/> . @prefix ont1: <http://tfs.cc/ont1/> . @prefix exp0: <http://tfs.cc/exp0/> . @prefix term1: <http://tfs.cc/term1/> . @prefix exp1: <http://tfs.cc/exp1/> . @prefix tfs: <http://tfs.cc/> . @prefix term0: <http://tfs.cc/term0/> . @prefix meta: <http://tfs.cc/meta/> . @prefix ont0: <http://tfs.cc/ont0/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix exp: <http://tfs.cc/exp/> . @prefix owl2xml: <http://www.w3.org/2006/12/owl2-xml#> . @prefix term: <http://tfs.cc/term/> . @prefix ont: <http://tfs.cc/ont/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix isocat: <http://isocat.org#> . term1:en-English-N_-_ont-English rdf:type owl:Thing ; rdf:type term:Term ; term:designation expression1:en-English-N ; term:hasReferent concept0:English . term1:zh-程序-N_-_ont-Parser rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-程序-N ; term:hasReferent concept0:Parser . term1:zh-芬兰语-N_-_ont-Finnish rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-芬兰语-N ; term:hasReferent concept0:Finnish . term1:fi-jäsennin-N_-_ont-Parser rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:fi-jäsennin-N ; term:hasReferent concept0:Parser . term1:en-for-P_-_ont-Goal rdf:type owl:Thing ; rdf:type term:Term ; term:hasDesignation expression1:en-for-P ; term:hasReferent concept0:Goal . term1:zh-剖析-V_-_ont-Parse rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-剖析-V ; term:hasReferent concept0:Parse . term1:en-Chinese-A_-_ont-Chinese rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; meta:hasSource meta1:Wang ; term:hasDesignation expression1:en-Chinese-N ; term:hasReferent concept0:Chinese . term1:zh-为了-P_-_ont-Goal rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-为了-P ; term:hasReferent concept0:Goal . term1:zh-在___里-P_-_ont-Place rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-在___里-P ; term:hasReferent concept0:Place . term1:zh-是-V_-_ont-Role rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-是-V ; term:hasReferent concept0:Role . term1:en-in-P_-_ont-Place rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-in-P ; term:hasReferent concept0:Location . term1:en-be-V_-_ont-Role rdf:type owl:Thing ; rdf:type term:Term ; rdfs:comment "English verb for the subclass relationship"@en ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-be-V ; term:hasReferent concept0:Role . term1:zh-部分-N_-_exp-Preposition rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:zh-部分-N ; term:hasReferent expression0:Preposition . term1:zh-英语-N_-_ont-English rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-英语-N ; term:hasReferent concept0:English . term0:Term rdf:type owl:Thing ; rdf:type term:Term . term1:fi-suomi-N_-_ont-Finnish rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:fi-suomi-N ; term:hasReferent concept0:Finnish . term1:en-part-N_-_ont-Part rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-part-N ; term:hasReferent concept0:Part . term1:zh-部分-N_-_ont-Part rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-部分-N ; term:hasReferent concept0:Part . term1:en-parse-V_-_ont-Parse rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-parse-V ; term:hasReferent concept0:Parse . term1:en-program-N_-_ont-Program rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-program-N ; term:hasReferent concept0:Software . term1:zh-中文-N_-_ont-Chinese rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; meta:hasSource meta1:Wang ; term:hasDesignation expression1:zh-中文-N ; term:hasReferent concept0:Chinese .

Because ARQ as an RDF tool does not apply an OWL reasoner, one cannot use it to capture statements which are only implied by an ontology, such as transitive closures. To run a query over, say, all the subclasses or members of a class, we can use the Pellet SPARQL-DL query engine.

Running the Pellet command line tool pellet.sh with command line

pellet query -q tfquery0.sparql file:///home/lcarlson/Data/CF/TF/owl/TFS.owl

where tfquery0.sparql is following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX ont: <http://tfs.cc/ont/> PREFIX term: <http://tfs.cc/term/> PREFIX exp: <http://tfs.cc/exp/> PREFIX meta: <http://tfs.cc/meta/> CONSTRUCT { ?conc rdfs:subClassOf ont:Concept . } WHERE { ?conc rdfs:subClassOf ont:Concept . }

prints out an ontology that asserts all subClassOf statements under ont:Concept asserted or entailed by TFS.owl (in RDF/XML format):

Show/hide pellet log
<rdf:RDF xmlns:tfs="http://tfs.cc/" xmlns:isocat="http://isocat.org#" xmlns="http://tfs.cc/owl/TFS.owl#" xmlns:ont="http://tfs.cc/ont/" xmlns:term="http://tfs.cc/term/" xmlns:term1="http://tfs.cc/term1/" xmlns:owl2xml="http://www.w3.org/2006/12/owl2-xml#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:exp0="http://tfs.cc/exp0/" xmlns:exp1="http://tfs.cc/exp1/" xmlns:ont1="http://tfs.cc/ont1/" xmlns:ont0="http://tfs.cc/ont0/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:term0="http://tfs.cc/term0/" xmlns:meta="http://tfs.cc/meta/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:exp="http://tfs.cc/exp/" xmlns:meta1="http://tfs.cc/meta1/" > <rdf:Description rdf:about="http://tfs.cc/ont/Program"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Ends"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Linguistics"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Information"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Language_industry"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Means"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Finnish"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Information_and_communication_technology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Inanimate"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Instrument"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Geography"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Software"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Source"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Domain"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Input"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Chemistry"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Data"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Place"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Patient"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Construction_industry"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Animate"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Nothing"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Function"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Chinese"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Country"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Opinion"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Language_technology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Agent"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Human"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Terminology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Part"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Time"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/American_English"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Role"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Effect"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Goal"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Language"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Location"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Multilingual_language_technology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Cause"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Parser"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Concept"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/English"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> </rdf:RDF>

On the other hand again, the current version (Pellet 2.0.0.rc7) of the Pellet query engine does not do DESCRIBE queries. Also, the result set of a SPARQL-DL query contains both asserted and entailed statements indiscriminately.

Pellet is able to extract certain types of ontology subsets. pellet extract extract all statements of a given OWL type from an ontology. The allowed types are:

DefaultStatements, AllClass, AllIndividual, AllProperty, ClassAssertion, DataPropertyAssertion, DifferentIndividuals, DirectClassAssertion, DirectSubClassOf, DirectSubPropertyOf, DisjointClasses, DisjointProperties, EquivalentClasses, EquivalentProperties, InverseProperties, ObjectPropertyAssertion, PropertyAssertion, SameIndividual, SubClassOf, SubPropertyOf]. Example: "DirectSubClassOf DirectSubPropertyOf" (Default: DefaultStatements)

pellet modularity extracts "safe" modules (sub-ontologies) around given target entities. A safe module extracts enough statements around the target concepts that the inferences from the target concepts in the extract are the same as in the original ontology, i.e. any statement added to the extract will cause a conflict in the extract if it would contradict the original ontology. However, safe modules can be too large for practical purposes, if the ontology is well connected.

Different tools can be chained. Say we want to extract from TFS.owl all and only the statements asserted in it about classes that are (asserted or entailed ) subclasses of ont:Concept . Here is one way. First select the relevant classes from TFS.owl using the Pellet engine. Temporarily add the inferred subclass statements to TFS.owl (as assertions, say using some editor). Then run ARQ to describe all the (now) asserted subclasses of ont:Concept in the extended model. The result is the desired sub-ontology. Though doable, this involves many steps of tedious work. The TF tools try to reduce such intermediate steps.

Show/hide ARQ log
@prefix : <http://tfs.cc/owl/TFS.owl#> . @prefix expn: <http://tfs.cc/exp/> . @prefix meta1: <http://tfs.cc/meta1/> . @prefix ont: <http://tfs.cc/ont/> . @prefix exp0: <http://tfs.cc/exp0/> . @prefix term1: <http://tfs.cc/term1/> . @prefix exp1: <http://tfs.cc/exp1/> . @prefix tfs: <http://tfs.cc/> . @prefix term0: <http://tfs.cc/term0/> . @prefix meta: <http://tfs.cc/meta/> . @prefix ont1: <http://tfs.cc/ont1/> . @prefix ont0: <http://tfs.cc/ont0/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix exp: <http://tfs.cc/exp/> . @prefix owl2xml: <http://www.w3.org/2006/12/owl2-xml#> . @prefix term: <http://tfs.cc/term/> . @prefix ont: <http://tfs.cc/ont/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix isocat: <http://isocat.org#> . concept:Program rdf:type owl:Class ; rdfs:subClassOf concept:Software . concept:Ends rdf:type owl:Class ; rdfs:subClassOf concept:Goal . concept:Linguistics rdf:type owl:Class ; rdfs:subClassOf concept:Language_industry . concept:Information rdf:type owl:Class ; rdfs:subClassOf concept:Inanimate . concept:Language_industry rdf:type owl:Class ; rdfs:subClassOf concept:Domain . concept:Means rdf:type owl:Class ; rdfs:comment "on or with which something goes or is done"@en ; rdfs:subClassOf concept:Role . concept:Finnish rdf:type owl:Class ; rdfs:subClassOf concept:Language . concept:Information_and_communication_technology rdf:type owl:Class ; rdfs:subClassOf concept:Domain . concept:Inanimate rdf:type owl:Class ; rdfs:subClassOf concept:Concept . concept:Instrument rdf:type owl:Class ; rdfs:subClassOf concept:Means ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:allValuesFrom concept:Function ; owl:onProperty concept:function ] . concept:Geography rdf:type owl:Class ; rdfs:subClassOf concept:Domain . concept:Software rdf:type owl:Class ; rdfs:subClassOf concept:Information ; rdfs:subClassOf concept:Instrument ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:hasValue concept0:Information_and_communication_technology ; owl:onProperty concept:hasDomain ] . concept:Source rdf:type owl:Class ; rdfs:comment "semantic role for that which something comes from"@en ; rdfs:subClassOf concept:Role . concept:Domain rdf:type owl:Class ; rdfs:comment "Domains are abstractions used to define subclasses of Concept, e.g. domain ICT will single out ICT concepts."^^xsd:string ; rdfs:subClassOf meta:Meta ; rdfs:subClassOf concept:Concept . concept:Input rdf:type owl:Class ; rdfs:subClassOf concept:Data . concept:Chemistry rdf:type owl:Class ; rdfs:subClassOf concept:Domain . concept:Data rdf:type owl:Class ; rdfs:subClassOf concept:Information . concept:Place rdf:type owl:Class ; rdfs:comment "a concrete location (a place)"@en ; rdfs:subClassOf concept:Role ; rdfs:subClassOf concept:Concept . concept:Patient rdf:type owl:Class ; rdfs:subClassOf concept:Goal . concept:Construction_industry rdf:type owl:Class ; rdfs:subClassOf concept:Domain . concept:Animate rdf:type owl:Class ; rdfs:subClassOf concept:Concept . owl:Nothing concept:Function rdf:type owl:Class ; rdfs:subClassOf concept:Ends . concept:Chinese rdf:type owl:Class ; rdfs:subClassOf concept:Language . concept:Country rdf:type owl:Class ; rdfs:subClassOf concept:Place . concept:Opinion rdf:type owl:Class ; rdfs:subClassOf concept:Information . concept:Language_technology rdf:type owl:Class ; rdfs:subClassOf concept:Language_industry ; rdfs:subClassOf concept:Information_and_communication_technology . concept:Agent rdf:type owl:Class ; rdfs:subClassOf concept:Source . concept:Human rdf:type owl:Class ; rdfs:subClassOf concept:Animate . concept:Terminology rdf:type owl:Class ; rdfs:subClassOf concept:Language_industry . concept:Part rdf:type owl:Class ; rdfs:comment "the property of having parts and/or being part of a whole"@en ; rdfs:subClassOf concept:Role . concept:Time rdf:type owl:Class ; rdfs:comment "What is time? if nobody asks me, I know___"@en ; rdfs:subClassOf concept:Role ; rdfs:subClassOf concept:Concept . concept:American_English rdf:type owl:Class ; rdfs:subClassOf concept:English . concept:Role rdf:type owl:Class ; rdfs:comment "A semantic role (filler and the synonymous property)"^^xsd:string ; rdfs:subClassOf concept:Concept . concept:Effect rdf:type owl:Class ; rdfs:subClassOf concept:Goal . concept:Goal rdf:type owl:Class ; rdfs:comment "to which something goes or for which something is done"@en ; rdfs:subClassOf concept:Role . concept:Language rdf:type owl:Class ; rdfs:subClassOf concept:Information ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:minCardinality "1"^^xsd:nonNegativeInteger ; owl:onProperty concept:ctryCode ] . concept:Location rdf:type owl:Class ; rdfs:comment "in which something is"^^xsd:string ; rdfs:subClassOf concept:Role . concept:Multilingual_language_technology rdf:type owl:Class ; rdfs:subClassOf concept:Language_technology . concept:Cause rdf:type owl:Class ; rdfs:subClassOf concept:Source . concept:Parser rdf:type owl:Class ; rdfs:subClassOf concept:Software ; rdfs:subClassOf concept:Program ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:hasValue concept0:Language_technology ; owl:onProperty concept:hasDomain ] . concept:Concept rdf:type owl:Class ; rdfs:comment "concept is a more or less language independent ontology class."^^xsd:string ; rdfs:subClassOf concept:Object . concept:English rdf:type owl:Class ; rdfs:subClassOf concept:Language .
Extracting subsets from YSO using the Sesame RDF library

In an early stage of the TF project, the FinnOnto YSO thesaurus ontology turned out too big to handle with Protege or Jena tools in the space available on normal desktop computers. The more robust Sesame 2 RDF repository library and query language seRQL was used to extract manageable sized coherent subsets from YSO around concepts matching a given pattern.

The extraction tool consisted of the following pieces:

SuperGraph.java
a java script built on the Sesame rdf library which reads the yso ontology, finds all concepts matching a given pattern, and extracts an upward closure of the matched concepts under the yso schema from yso. The superclasses of each concept are included in the closure, but narrower, related or associated concepts are not included recursively.
YSO_schema.owl
a manually extracted schema subset of YSO.owl
YSO_header.owl
an RDF header file to be included in the extract
YSO.owl
local copy of the YSO ontology
foaf.owl
local copy of the Friend of a Friend ontology referenced by YSO
skos.owl
local copy of the Simple Knowledge Organization Systems ontology referenced by YSO

The hits are marked up with string property hit:pattern which holds the search pattern. They can be retrieved from the extract using the Protege SWRL query tool with the query SELECT ?subj WHERE ?subj hit:pattern ?obj .

The SuperGraph script is called with

java SuperGraph YSO.owl YSO_schema.owl YSO_header.owl pattern

The pattern is given as a sesame query language seRQL expression where 'string' refers to the string value of the meta:label of a class, for example

"string LIKE \"*rakennus*\" AND NOT string like \"*kehon*\"".

Merging

One of the basic tasks in using TF is extracting some desired subset of data from one or more repositories as dataset. This section surveys different ways of going about this task.

Extracting, combining and matching terminological ontologies using generic and TF ontology tools has been studied in connection with merging the epidemic ontologies PULS and Biocaster. The epi related sparql scripts are in TF svn in directory cnv/script .

There are two main ways of merging ontologies in TF, by rewriting, or by importing and bridging. In rewriting, the contents of the donor ontology are reshaped into TF entities, removing the original structure. In importing/merging, a donor OWL ontology imports TFS and its concepts are related to TFS top ontology using bridge axioms. Rewriting avoids redundancy, saves space and creates better integration, but concurrent development of ontologies is difficult. Importing and bridging creates redundancy, bloat and clutter, but allows ontologies to keep their identity under the merger. The two methods can also apply in sequence, which may give something of both. Query imports offer a dynamic method of partial merging of ontologies.

The workflow in merging and matching the BioCaster and PULS epidemic-disease ontologies is described as follows. The tasks below have the form source -method-> target , where source and target are files and method is a shorthand for a command line to run or to manual steps (e.g. using an editor). This describes the workflow before implementing the TF query engine. In the following section, we show how the workflow gets simplified using the TF query engine.

To run one of the *.xsl scripts, use a xslt processsor (Xalan, Saxon, xsltproc, or other), for instance, to do the first task, run command java org.apache.xalan.xslt.Process -in BioCaster2008-04-20.owl -xsl bioc.xsl -out bioc.owl .

Ontology conversion

This section traces the steps taken to convert PULS and BioCaster ontologies to TF.

source/s method target task
BioCaster2008-04-20.owl bioc.xsl bioc.owl (converts the thesaurus, i.e. the diseases)
biot.xsl biot.owl (converts the terms)
TFS.owl
biot.owl
bioc.owl
write bridge biob.owl (imports and bridges biocaster)
source/s method target task
def-basic-concepts-disease.lisp expel puls-basic.ttl (converts the class schema using a lisp program)
def-all-disease-list.lisp puls.perl puls-all.ttl (converts the disease list using a perl script)
puls-basic.ttl
puls-all.ttl
write bridge puls-bridge.owl (imports and bridges puls)
source/s method target task
TFS.owl
bio-bridge.owl
puls-bridge.owl
write bridge epi.owl (imports and bridges puls and biocaster into one)
Ontology merging

RDF does not do imports. One can form a sparql dataset of the model and its imports. Another solution is to merge imports to one file.

source/s method target task
epi.owl Protege4 or rewrite epi2.owl (merge epi.owl to epi2.owl, see Note 2 below)

Matching

This section traces the steps taken to match the diseases in two ontologies. Since only disease name similarity was used in the matching, the diseases were first extracted from the leveled ontology using SPARQL and then compared.

source/s method target task
epi2.owl epi-extract epi-extract.owl (extracts the diseases and their English names)
epi-extract.owl epi-match epi-match.ttl (compares the PULS and BioCaster disease names, see Note 1 below)
Ontology extraction

This section traces a series of steps to extract a sub-ontology describing the diseases.

epi2.owl epi-instance epi-instance.ttl (use sparql to describe the matched instances)
epi2.owl epi-concept epi-concept.ttl (use sparql to describe the matched classes)
epi2.owl epi-term epi-term.owl (use pellet to extract the terms for the matches)
epi2.owl epi-class epi-class.owl (use pellet to extract the superclasses of the matches)
epi2.owl epi-tree epi-tree.ttl (use sparql to extract the superclass tree of the matches)
epi-instance.ttl
epi-concept.ttl
epi-tree.ttl
epi-term.owl
manual epi-merge.owl (merge the extracted parts)

Notes:

  1. The match condition in epi-match.sparql compares disease names case-insensitively using the following ad hoc similarity criterion: two names match if each matches the other as a regular expression. This is just a first stab that can be improved. Further match conditions that concern semantic properties beyond the mere label could be added.
  2. Exporting TF ontologies with Protege 4 fails when resource URIs contain non-ascii Unicode characters. Protege 4.0 generated unused entity declarations and xml namespace prefix declarations for URIs containing such characters. Protege 4.1.0 (build 209) breaks when exporting the leveled ontology. Ontology imports can also be merged by rewrite utility option all.
  3. Protege 4 generates OWL 1.1 / 2.0 schema elements which Pellet version 2.0.0.rc7 does not support.
Matching
Matching ontologies Koneistettu ontologiatyö

The sequence of ontology merging and extraction steps described above can be telescoped into one pellet4tf query. Matching disease names in PULS and Biocaster epidemic ontologies can be listed with one TF query PULS- ja Biocaster-hankkeiden tautiontologioiden vertailu onnistuu yhdellä TF-kyselyllä.

Importing the alignment to epi.owl virtually merges the matched diseases so that an OWL reasoner or editor treats each pair as one and the same entity. Alternatively, a DESCRIBE query can be written to extract a small ontology containing just the matched diseases.

The alignment is expressed as a bridge ontology. The matched ontologies can remain as distinct logical and physical entities.

Using bridges or on-demand ontology conversion, the owners of the two ontologies can keep automatically updated about developments between their respective TF repositories.

Masters thesis on ontology matching in TF (Kun Ji).

Tiedot voidaan yhdistää ilman, että osaontologiat menettävät itsenäisyyttään.

Tuontikyselyjen avulla ontologiat pysyvät synkronissa vaikka niitä kehitetään erikseen.

Ontologioiden vertaus- ja yhdistelytyökaluista on tekeillä maisterintyö (Kun Ji).

Size and speed
Size and speed

The combined PULS/BioCaster epidemic ontology contains about 3000 classes and 20K instances, altogether about 300K triples. Their materialization (inference closure) contains about 1M triples.

The disease match query from the merged PULS-BioCaster ontology takes a few minutes real time on usual hardware.

For web use, offline processing of ontology queries and caching of results is the way to go.

Repositories can get big, but queries and entries need to remain manageable size ("small models" approach).

Epidemiaontologiat

Yhdistetty PULS/BioCaster epidemiaontologia sisältää noin 3000 luokkaa ja 20K esiintymää sekä noin 300K väittämää. Päättelijä lisää 600K väittämää, joten ontologiasta seuraa noin 1M väittämää.

Ontologioiden vertailukyselyt kestävät 5-10 minuuttia reaaliajassa.

Termitehtaan siilot voivat olla suuria, mutta termikyselyjen on oltava nopeita ja kohtuukokoisia.

Ratkaisu: Tausta-ajo ja tulosten välitalletus

Survey

This section surveys collaborative software and platforms that might suit TermFactory.

Collaborative terminology work

Collaborative software (also referred to as groupware or workgroup support systems) is software designed to help people involved in a common task achieve their goals.

Electronic communication tools send messages, files, data, or documents between people and hence facilitate the sharing of information. Examples include

  • synchronous conferencing
  • e-mail
  • faxing
  • voice mail
  • Wikis
  • Web publishing
  • revision control

Electronic conferencing tools facilitate the sharing of information, but in a more interactive way. Examples include:

  • Internet forums (also known as message boards or discussion boards) — a virtual discussion platform to facilitate and manage online text messages
  • Online chat — a virtual discussion platform to facilitate and manage real-time text messages
  • Instant Messaging
  • Telephony — telephones allow users to interact
  • Videoconferencing — networked PCs share video and audio signals
  • Data conferencing — networked PCs share a common whiteboard that each user can modify
  • Application sharing — users can access a shared document or application from their respective computers simultaneously in real time
  • Electronic meeting systems (EMS) — originally these were described as "electronic meeting systems," and they were built into meeting rooms.

Collaborative management tools facilitate and manage group activities. Examples include:

  • electronic calendars (also called time management software) — schedule events and automatically notify and remind group members
  • project management systems — schedule, track, and chart the steps in a project as it is being completed
  • workflow systems — collaborative management of tasks and documents within a knowledge-based business process
  • knowledge management systems — collect, organize, manage, and share various forms of information
  • enterprise bookmarking — collaborative bookmarking engine to tag, organize, share, and search enterprise data
  • prediction markets — let a group of people predict together the outcome of future events
  • extranet systems (sometimes also known as 'project extranets') — collect, organize, manage and share information associated with the delivery of a project (eg: the construction of a building)
  • social software systems — organize social relations of groups
  • online spreadsheets — collaborate and share structured data and information

This section surveys concepts and solutions relevant for collaborative terminology work .

Moderation

Moderation imposes quality control on a collaborative website. Moderation means rating of contributions and ranking of users either top down by appointed moderators or bottom up by other users, Following Wikipedia , we may disitnguish between supervisor moderation and user moderation. The mission of the TF workflow solution is to enable steady growth of term quantity and quality with low supervisor moderation cost.

Supervisor moderation

Also known as unilateral moderation, this kind of moderation system is often seen in Internet forums. A group of people are chosen by the webmaster (usually on a long-term basis) to act as delegates, enforcing the community rules on the webmaster's behalf. These moderators are given special privileges to delete or edit others' contributions and/or exclude people based on their e-mail address or IP address, and generally attempt to remove negative contributions throughout the community.

User moderation

User moderation allows any user to moderate any other user's contributions. On a large site with a sufficiently large active population, this usually works well, since relatively small numbers of troublemakers are screened out by the votes of the rest of the community. Strictly speaking, wikis such as Wikipedia are the ultimate in user moderation, but in the context of Internet forums, the definitive example of a user moderation system is Slashdot.

For example, each moderator is given a limited number of "mod points," each of which can be used to moderate an individual comment up or down by one point. Comments thus accumulate a score, which is additionally bounded to the range of -1 to 5 points. When viewing the site, a threshold can be chosen from the same scale, and only posts meeting or exceeding that threshold will be displayed. This system is further refined by the concept of karma—the ratings assigned to a users' previous contributions can bias the initial rating of contributions he or she makes.

Slashdot invented a moderation system where every comment posted (including those posted anonymously) has a starting score which can be incremented or decremented by semi-randomly chosen moderators. When moderating, the moderator chooses a given descriptor (such as "insightful", "funny", "troll") and each descriptor has a positive or negative value associated with it. As such, posts not only are scored, but characterized ("20% insightful, 80% interesting"). Users can configure the value of each descriptor. The descriptors available are normal, offtopic, flamebait, troll, redundant, insightful, interesting, informative, funny, overrated, and underrated.

Moderation points added to a comment are also added to a user's karma score. Having high karma gives one bonus point to posts made by that author. (Being a registered poster adds one more, so that the highest normally achieved starting score is two).

Conversely, users with low karma have penalties imposed on them. People that post comments designed to get more karma, for example mirroring a linked article or presenting a banal groupthink opinion or lame joke, are often referred to as karma whores. Those who can moderate are selected by their karma score and number of meta moderations (and maybe other criteria). Slashdot editors, including Rob Malda ("CmdrTaco"), can moderate limitlessly. Moderator access for non-editors is time limited to a few days.

A given comment can have any integer score from −1 to +5, and Slashdot users can set a personal threshold where no comments with a lesser score are displayed. A person browsing the comments at a threshold of 1 will not see comments with a score of −1 or 0 but will see all others.

A meta-moderation system was implemented to moderate the moderators and help contain abuses. Karma is implemented in the Slash Content management system and hence is generally used by all the sites that use this software. All posts with scores −1 through 3 are hidden by default when not logged in.

Meta-moderation is a Slashdot mechanism whereby a reader can volunteer to review the correctness of moderation decisions. The reader is presented with eight to ten moderation decisions made by other readers and is asked to say whether or not those moderation choices were fair, by reading the post which was moderated and considering the moderation given.

The correctness of users' initial moderations, as determined by the users who are meta-moderating them, affects how often the initial moderators are given moderation points[5], so a reader who moderates but constantly has their moderation decisions marked incorrect under meta-moderation will only infrequently be given moderation points.

Voting

Voting has many uses in collaboration software. Condorcet voting offers input from multiple experts or perspectives and may reduce intransitivity problems in decision making. In recommendation systems, rating or voting on many items can be used to formulate profiles for highly successful recommendations; and in document collaboration, such as Wikipedia, voting methods help to guide the creation of new pages.

Collaborative Protege implements two types of voting mechanisms that can be used for voting of change proposals.

Platforms

This section discusses using TF on different collaborative platforms.

The attitude in TF from the start was that it is not wise to get stuck to any particular platform, because platform fashions come and go even faster than back end (ontology and web service) software (which is far from hewn in stone either). Instead, TF is to provide a relatively wear-resistant content design and workflow, a generic back end design and implementation, and a flexible front end design plus a light demonstrator of the same. However, we do need platforms, and we do not want to build them from scratch, so it is important to become clear about the choices and how to choose between them.

One of the biggest hurdles in implementing groupware is getting people to use it. If people don't feel comfortable with the software, they won't use it. Another hurdle is the desire to standardise knowledge practice across the enterprise and to implement tools and processes which support that aim. To minimise these difficulties, one approach is to provide a more abstract service API that is embedded in existing working environments, preserving their familiar look and feel and immediacy of use.

Different existing platforms may be more or less adapted for TF. Many of the platforms advertise being open to customisation or extension. Known variously as extensions, mod(ification)s, addons,or plugins, they vary from relatively straightforward items like graphics (emoticons, avatars, skins) to complete modules that provide new functionalities. Some systems try to regiment the development of such extensions, but in many cases the extension method is some kind of code patch. Java-based Atlassian Confluence seems to have a more regimented plugin programming architecture.

Multilinguality support on collaborative platforms varies from alternative interface language options to multilingual site maintenance tools. Of the open source platforms, best marks in multilinguality support have been given to Plone (German) and Drupal (Dutch). Moodle is an open source learning environment. Slashcode is a weblog or forum platform known for its moderation system.

This section surveys popular existing collaborative platforms that might host the TermFactory API. The platforms considered include the open source platforms MediaWiki, Plone, Drupal, Moodle, and Slash(dot), and the commercial platforms Alfresco and Confluence. Confluence is used as an experimental TF demo platform. The platform descriptions will focus on TF compatibility as soon as the TF architecture is fixed.

In general, a content management system (CMS) is a computer application used to create, edit, manage, search and publish various kinds of digital media and electronic text. One can distinguish between (mostly) commercial enterprise CMS and (mostly) open source web CMS. CMSs are frequently used for storing, controlling, versioning, and publishing industry-specific documentation such as news articles, operators' manuals, technical manuals, sales guides, and marketing brochures. The content managed may include computer files, image media, audio files, video files, electronic documents, and Web content. These concepts represent integrated and interdependent layers. There are various nomenclatures known in this area: Web Content Management, Digital Asset Management, Digital Records Management, Electronic Content Management and so on. The bottom line for these systems is managing content and publishing, with a workflow if required.

MediaWiki

Wikipedia is built on a version of MediaWiki. MediaWiki is a web-based wiki software application used by all projects of the Wikimedia Foundation, all wikis hosted by Wikia, and many other wikis, including some of the largest and most popular ones. Originally developed to serve the needs of the free content Wikipedia encyclopedia, today it has also been deployed by companies for internal knowledge management, and as a content management system. Notably, Novell uses it to operate several of its high traffic websites.

MediaWiki is written in the PHP programming language, and can use either the MySQL or PostgreSQL relational database management system. MediaWiki is distributed under the terms of the GNU General Public License while its documentation is released under the GFDL and partly in the public domain, making it free and open source software. A good feature in MediaWiki is that the object content pages have associated discussion pages. This layout gives good support to the collaborative management of the content. While MediaWiki comes with a basic set of features related to restricting access and defining user groups, page access control does not tend to be given high priority in development. For instance, it is extremely difficult to create a wiki where only certain users can read and access some pages. User documentation is found online only (there is no printed manual). No formal MediaWiki syntax has been defined. making a port to another language hard. There is no offline version of MediaWiki, which would enable users to update pages on their client offline, and then have those pages automatically transferred to the server when re-connected, in a similar way to the client-server operation of Lotus Notes. An offline MediaWiki client could also help users with the back-up of important pages.

There is an experimental TF Mediawiki installation at http://omatuki.dyndns.org/mediawiki/ .

Plone

Plone is a free and open source content management system built on top of the Zope application server. It is suited for an internal website or may be used as a server on the Internet, playing such roles as a document publishing system and groupware collaboration tool.

Plone is released under the GNU General Public License (GPL). Additional functionality is added to Plone with Products, which may be distributed through the Plone website or otherwise. Plone is built on the Zope application server, which is written in Python. All information stored in Plone is stored in Zope's built-in transactional object database (ZODB). Plone comes with installers for Windows, Mac OS X, and Linux, along with other operating systems. Plone is available in over 35 languages. Templates written with Cascading Style Sheets can be used to customize a website's look. Plone comes with a user management system called Pluggable Authentication Service.

There is an experimental TF Plone installation at http://omatuki.dyndns.org:8090/Plone/ .

Drupal

Drupal free and open source modular framework and Content Management System (CMS) written in PHP. It is used as a "back end" system for many different types of websites, ranging from small personal blogs to large corporate and political sites. The standard release of Drupal, known as "Drupal core", contains basic features common to most CMSs. These include the ability to register and maintain individual user accounts, administration menus, RSS-feeds, customizable layout, flexible account privileges, logging, a blogging system, an Internet forum, and options to create an interactive community website.

Tens of thousands of people and organizations are using Drupal to power scores of different web sites, including community web portals, discussion sites, corporate web sites, intranet applications, e-commerce applications, resource directories and social networking sites. Freely available add-on modules enable more features. Drupal is open-source software distributed under the GPL ("GNU General Public License") and is maintained and developed by a community of thousands of users and developers.

Drupal can run on any computing platform that supports:
  • a web server capable of running PHP (version 4.3.5+), including Apache, IIS, Lighttpd, and nginx.
  • a database, such as MySQL or PostgreSQL, to store content and settings.

There is an experimental TF Drupal installation at http://omatuki.dyndns.org/drupal/ .

Moodle

Moodle is a free and open source e-learning software platform, also known as a Course Management System, Learning Management System, or Virtual Learning Environment. It has a significant user base with 49,256 registered sites with 28,177,443 users in 2,571,855 courses (as of February, 2009).

Moodle is designed to help educators create online courses with opportunities for rich interaction. Its open source license and modular design means that people can develop additional functionality. Development is undertaken by a globally diffused network of commercial and non-commercial users, streamlined by the Moodle company based in Perth, Western Australia.

Moodle contains a Glossary tool.

There is an experimental TF Moodle installation at http://omatuki.dyndns.org/moodle/ .

Among the commercially supported platforms, there are java based Alfresco and Atlassian Confluence.

Alfresco

Alfresco is an Enterprise content management system for Microsoft Windows and Unix-like operating systems. Alfresco comes in two flavours. Alfresco LABS is free software, GPL licensed open source and open standards, but never officially stable. Alfresco Enterprise Edition is commercially / proprietary licensed open source, open standards and enterprise scale. Its design is geared towards users who require a high degree of modularity and scalable performance. Alfresco includes a content repository, an out-of-the-box web portal framework for managing and using standard portal content, a CIFS interface that provides file system compatibility on Microsoft Windows and Unix-like operating systems, a web content management system capable of virtualizing webapps and static sites via Apache Tomcat, Lucene indexing, and jBPM workflow. The Alfresco system is developed using Java technology.

Confluence

Confluence is a web-based wiki written in Java and mainly used in corporate environments. It is developed and marketed by Atlassian Software Systems. Confluence is sold as either on-premise software or as a hosted solution. Its license is proprietary, but a zero-cost license program is available for non-profit organizations and open source projects.

Confluence was tried out as a testbed for the first TF demo.

Confluence is used by over 6,700 organisations in more than 90 countries. Notable customers include Adobe Systems, Sun Microsystems, Johns Hopkins University and United Nations. The current version is Confluence 2.10, launched on December 3, 2008. Confluence 2.10 features a Widget Connector for embedding web widgets from other web sites into a Confluence page. Supported web sites include YouTube, Flickr and SlideShare.

Confluence has many relevant extensions, including a comment tab theme , a vote macro and an Index Glossary plugin, and a Karma plugin

Corporate commercial platforms include MicroSoft Sharepoint and Lotus Notes.

Sharepoint

Microsoft SharePoint products and technologies include browser-based collaboration and a document-management platform. These can be used to host web sites that access shared workspaces and documents, as well as specialized applications like wikis and blogs from a browser. Users can manipulate proprietary controls or pieces of content called web parts to create or modify sites. SharePoint is not intended to replace a full file server. Instead, it is targeted as a collaborative workspace, a tool for the management and automation of business processes, and a platform for social networking. Microsoft markets this as Collaboration, Processes, and People. SharePoint interface is through a web interface, such as a task list or discussion pane. SharePoint sites are actually ASP.NET 2.0 applications, which are served using IIS and use a SQL Server database as a data storage backend. All site content data is stored within a SQL Server database called WSS_Content.

The term "SharePoint" collectively refers to two products, the platform and the services. WSS is the platform and is included with Windows Server, while MOSS provides additional services and is licensed separately. As of 2009 the most current of these two are:

  • Windows SharePoint Services 3.0 (WSS)
  • Microsoft Office SharePoint Server 2007 (MOSS)

Lotus Notes

Lotus Notes from IBM is one of the oldest corporate products. In the early days of the product, the most common applications were threaded discussions and simple contact management databases. Today Notes also provides blogs, wikis, RSS aggregators, CRM and Help Desk systems, and organizations can build a variety of custom applications for Notes using Domino Designer. Notes' integration with IBM's Sametime instant messaging allows users to see other users online and conduct chat sessions with them. A chat session can be with one person or multiple people.

Since version 7, Notes has provided a web services interface. Domino can be a web server for HTML files too; authentication of access to Domino databases or HTML files uses Domino's own user directory and external systems such as Microsoft's Active Directory. A design client is available to allow rapid development of databases consisting of forms, which allow users to create documents; and views, which display selected document fields in columns. In addition to being a groupware system (e-mail, calendaring, shared documents and discussions), Notes/Domino is also a platform for developing customized client-server and web applications. Its use of design constructs and code provide capabilities that facilitate the construction of "workflow" type applications (which may typically have complex approval processes and routing of data).

Web editing

This section discusses tools for editing TF term entries on different platforms.

A Web server is a computer program that delivers (serves) content, such as Web pages, using the Hypertext Transfer Protocol (HTTP), over the World Wide Web. A web client (user agent) is a piece of software which is able to request (and send back) content from web servers and act on it, e.g. render it on a screen for viewing. A server answers outside requests, a client makes them. Web browsers clients for web servers.

The Web started read-only. Much of it is still that way. Content is prepared in html on a (web-)local host using editors that are installed and accessed locally (relative to web protocols) and new pages are installed on a web server locally. A browser fetches the page from the server and shows it on the screen. Pages are static, clients do not change them (directly locally, or indirectly through server requests). (For a resume of the history see this Opera article .)

Local web page editors may use http to download and upload (publish) pages on the server. This is still offline editing. The same service is provided in many XML editors. At one time, there was a wave for browsers to double as web page editors (e.g. Netscape Composer, Amaya). Now web page editors have become a separate business catering for page designers.

An established way for users to edit content is through html forms. A form is an element on a html page that which appears writable in a browser. There is a way to submit the filled out form (the contents of the fields) to a server (action) linked on the form. The server acts on the contents, e.g. by preparing another page and sending it back to the browser. This html form-action loop provides one basic way to edit a page through the web. Pages that change as they are viewed or edited through a client are known as dynamic web pages .

On the server, some software takes care of processing the requested action. A large variety of different technologies exist. One of the oldest is Common Gateway Interface (cgi) which provides a place to store scripts (programs in some scripting language, like perl) that can do form actions. Some web server platforms provide a built-in language for server side programming, like Apache php and Java servlets.

Web site management has created a new marketplace for editing tools. A content management system is installed on a web server and managed mainly through the web. They provide administrative interfaces for creating collections of pages on a web site. In this case, editing happens over the web. between the server and a clent (a browser). Web editing solutions can be classified by the underlying software technologies (e.g. open-source Perl, Python, Ruby, LAMP, AJAX, or Microsoft .NET, IBM Lotus).

An opposite direction from server side scripting is to move work to the client end (browsers). This reduces load on servers so response times get better. The traffic between the server and client is lowered as the server only needs to send instructions for the browser to build a page. On the other hand, there are many different browsers, so it becomes more difficult to make sure that content is rendered correctly by all clients.

The onslaught of client side programming has contributed to the change of web page looks from austere text to interactive TV like experience. On the other hand, the javascript (r)evolution is still ongoing, and clients (including human ones) vary in their acceptance of the technology. Client side dynamic pages enable what is known as mashup, where composite pages are built in the client with ingredients from several servers (e.g. google maps). The leading client side scripting language is javascript (aka ecmascript). There are many program libraries available for it. (For a client side ontology browser written in javascript, see jOWL .)

To avoid obsolescence and vendor dependence, there are few things to try. First, one can stick to established standards and technologies that are likely to be supported by all platforms. Second, one can aim to provide just-below-the-surface plugin-type APIs which leave only a shallow level of platform specific code for different and forthcoming platform implementations to do. With this in mind, we have looked at editing solutions that support W3C standards, in particular X(HT)ML.

XForms

XForms 1.0 (Third Edition) was published on 29 October 2007. The original XForms specification was made an official W3C Recommendation on 14 October 2003, while XForms 1.1, which introduced a number of improvements, reached the same status on 20 October 2009.

Unlike the original HTML forms, the creators of XForms have used a Model-View-Controller approach. The "model" consists of one or more XForms models describing form data, constraints upon that data, and submissions. The "view" describes what controls appear in the form, how they are grouped together, and what data they are bound to. CSS can be used to describe a form's appearance.

An XForms document can be as simple as an HTML form (by only specifying the submission element in the model section, and placing the controls in the body), but XForms includes many advanced features. For example, new data can be requested and used to update the form while it is running, much like using XmlHttpRequest/AJAX except without scripting. The form author can validate user data against XML Schema data types, require certain data, disable input controls or change sections of the form depending on circumstances, enforce particular relationships between data, input variable length arrays of data, output calculated values derived from form data, prefill entries using an XML document, respond to actions in real time (versus at submission time), and modify the style of each control depending on the device they are displayed on (browser versus mobile versus text only, etc.). There is often no need for any scripting with languages such as JavaScript.

Like legacy forms, XForms can use various non-XML submission protocols (multipart/form-data, application/x-www-form-urlencoded), but a new feature is that XForms can send data to a server in XML format. XML documents can also be used to prefill data in the form. Because XML is a standard, many tools exist that can parse and modify data upon submission, unlike the case with legacy forms where in general the data needs to be parsed and manipulated on a case by case basis. XForms is itself an XML dialect, and therefore can create and be created from other XML documents using XSLT. Using transformations, XForms can be automatically created from XML Schemas, and XForms can be converted to legacy XHTML forms: this is basically how server side XForms work today.

At the time of this writing, no widely used web browser supports XForms natively. However, various browser plugins and client-side extensions exist. The Firefox XForms extension is part of the Mozilla Project, is compatible with releases of Firefox and Mozilla, and is built with nightly builds on most platforms as well as of version 0.8.4 (February 6th, 2008). Full XForms 1.0 SE support is not complete but covers most of the specification with a notable exception of attribute-based repeating used in HTML tables. The extension is available for both Firefox 2 and Firefox 3. There are two projects for a free client-side implementation without plug-in nor install : Ubiquity XForms and XSLTForms .

HTML forms are being worked over in HTML 5 .

X(HT)ML web editing

Instead of going through (X)(HT)ML forms, many web platforms use tools that allow editing a X(HT)ML documents or elements either in the raw or through some skin (CSS or other stylesheet).

Amaya

The W3C Amaya web browser is also an X(HT)ML editor. The Amaya X(HT)ML editor is based on XTiger , an XML markup language for marking up editable regions in XML documents. As of spring 2010, the XTiger specification is in the process of developing into XTiger XML language specification . Amaya's XTiger implementation only really XHTML and a few other special document types (MML, SVG). It does not yet work for general XML documents. TF entries in XHTML format can be edited in Amaya, but it is not easy to provide TF editing tools like TF meuns beyond basic XHTML. For instance Amaya 11.13.1 (Dec 2009) can edit XHTML but lags behind in the support of dynamic HTML (CSS2, javascript). The latest version date of Amaya is 2009.

Many Wiki platforms may provide WYSIWYG editing through third party editor plugins. Many are javascript based X(HT)ML editor plugins like WYMEditor , CKEditor , TinyMCE , XOpus or CodeMirror . For a comparison, see here .

Wiki editing

Many Wikis provide wysiwyg style textarea editors (see OperaWiki ), also referred to as client-side Rich text editors, WYSIWYG editors and Iframe editors. To the user, it is like having MS Word, Front Page, Dreamweaver or any other WYSIWYG HTML editor, built into a webpage. For textarea editor listings see htmlarea.com or GeniiSoft.com .

The details of embedding to Wiki pages varies depending on platform and implementation language. Some wikis are cagey about such inserts for security reasons.

MediaWiki has a survey of the state of the art of editing in MediaWiki and some other Wiki platforms.

(As of 2009) there is no available 'ready-to-go' package for incorporating full WYSIWYG into the MediaWiki software. The problem is that any WYSIWYG editor would have to know wikitext grammar, and no full grammar for wikitext exists - the "parser" doesn't parse, it's a twisty series of regular expressions. So present WYSIWYG editors either have to (a) reverse-engineer as much of a grammar as they can, or (b) forget wikitext and just write HTML. A proper grammar is not sufficient for a proper WYSIWYM editor, as opposed to WYSIAYG (what you see is all you get), but it is necessary. A proper grammar is a highly-desired thing for many other purposes as well, and present efforts are at "promising vapourware" status.

There are a few options for making a WYSIWYG editor that works in the browser.

  1. Pure DHTML/Javascript. Capture mouse input, buttons, keystrokes, etc., and actually edit the HTML of the current document. It's not trivial, but possible.
  2. Create a custom browser plug-in, Java applet, ActiveX control. This would probably be workable, but would take quite a bit of hackery, and may or may not work. Requiring users to install any sort of plugin is very undesirable.
  3. Both Mozilla and Internet Explorer have included ways to make sections of a page editable. IE 5.5 had the MSHTML Editing Platform (archive.org), and Mozilla has its Rich Text Editing API (called midas, see demo ). Both technologies allow Web developers to make parts of a page editable -- in slightly different ways, of course.

Most current in-browser WYSIWYG editors use the third option.

HTML 5

HTML5 is the proposed next standard for HTML 4.01, XHTML 1.0 and DOM Level 2 HTML. It aims to reduce the need for proprietary plug-in-based rich internet application (RIA) technologies such as Adobe Flash, Microsoft Silverlight, Apache Pivot, and Sun JavaFX. HTML5 incorporates Web Forms 2.0 , another WHATWG specification. According to the W3C timetable, it is estimated that HTML5 will reach W3C Recommendation by late 2010, but it is likely to be late. However, parts of the recommendation are already being implemented.

In HTML 5 menus can be nested inside other menus to create hierarchical menus. See also HTML 5 contenteditable attribute .

HTML 5 will help bring some standards to in-place web editing. There are two ways in HTML 5 to create an editable section on a web page—the designMode and contentEditable properties. A window or frame is turned editable by setting the designMode property on the document object to true. Typically an edit box is made using an IFrame in designMode. Any element containing text can be made editable by setting its contentEditable property to true. (contentEditable is supported in Firefox 3 and IE, Opera and Safari.)

Here is a demo of in-place editing using javascript and the contenteditable property.

At the present time, there are few web X(HT)ML editing tools that support both WYSYWYG structured X(HT)ML editing and standard (X)HTML/CSS menus. One thing we can do for now is provide a TF2X(HT)ML javascript editor plugin based on one of the generic ones. The main features to add are loading and saving of entries and the capability to browse existing values from ontologies to fill editable fields.

Applications

TF in CF

TF in CF

Exploit CF synergy of web harvesting and crowdsourcing:

  • Multilingual multidomain intelligence gathering
    • For new languages:
      • Finnish, Swedish
      • Russian, Chinese
    • For not "just" facts but also concepts, terms and expressions:
      • PULS
      • Likey
      • OntoR
  • Potential new applications come MOLTO tools:
    • translation lexicon acquisition
    • ontology verbalization
TF in MOLTO

TF in MOLTO

EU MOLTO 2010 - 2013

  • project for multilingual ontology-based online MT
  • Will use TF for online ontology and lexicon acquisition
MOLTO TT

Molto Translation Tools

The overall design of the MOLTO WP3 Translation Tools prototype is shown here. The core is a client side MOLTO translation editor embeddable to a third party translation management platform like GlobalSight, supported by web service APIs that take care of document, grammar and term management, GF translation, and ontology services.

Show/hide MOLTO TT design

MOLTO Translation Tools

Equivalents editor

The MOLTO project is to use TF as middleware between domain ontologies and GF grammar. The MOLTO translation editor includes a tab for editing multilingual term equivalents. Content to the translation editor can be sent in the form of a TF localization file. This raises the question how to communicate the json content to TermFactory. The section on json format details one way.

The changes made in an equivalents editor client can be carried back to a matrix TF ontology maintaining those terms using the TF web API. A RESTful solution is this. First convert the json back to TF using a query like

http://localhost:8080/TermFactory/query?url=changes.json&schema=TFS.owl&rw=relabel&f=TURTLE &cacheDB=true

The result is saved in the default TF database. Then use the following query to edit the matrix ontology with the cached equivalents ontology.

http://localhost:8080/TermFactory/edit?m=matrix&d=original&a=changes.json&cacheDB=true

Here, matrix stands for the URL of the matrix ontology, original to the previous version of the equivalents and changes.json is the database location of the new converted equivalents ontology. As a result, an updated copy of the matrix ontology appears in the database, from which it can be fetched to TermFactory Wiki.

Status 2012

Done by Jan 2012

  • tfs.cc server open. Services provided:
    • TF Mediawiki
    • Query form
    • Edit form
    • Web directory
    • SPARQL endpoint
Status 2012

Done by Jan 2012

  • Front end overhaul
    • Less settings to worry user
    • Finer control on entry layout
    • Completely localizable
    • Forms also in Finnish
    • Slightly less forbidding looks?
Status 2012

Done by Jan 2012

  • Ontology work
    • TF en-fi WordNet faster and more readable
    • ICD-10 ontologized and TF searchable
  • New tools forthcoming (MOLTO)
    • OWLIM heavy duty database backend
    • Equivalents editor
To do 2012

These tasks to do in 2012

  • Populate tfs.cc platforms and repositories
  • Deploy TF to the CF Procomm and MOLTO user communities
  • Integrate TF to MOLTO and Drupal platform
  • Integrate tabular editor

TF resources

TermFactory Manual

In many browsers you can choose page style from a built-in menu. As of end of 2011, Firefox, IE, and Opera have page style menus. Chrome does not, but the manual's on-page style menu seems to work. The print settings are defined in stylesheet print.css that gets imported by the other styles when css media is print (when the document is printed or viewed in the browser's print preview mode).

The slides of the Manual are generated with xslt script slide.xsl . An example of a Ubuntu command line is

xsltproc slides.xsl TFManual25en.xhtml > slides.xhtml

The table of contents of the Manual is generated with xslt2 script identity.xsl . An example of a Ubuntu command line is

saxonb-xslt -xsl:identity.xsl -ext:on -s:TFManual26en.xhtml

The result file is generated in the same directory with name TFManual_en.xhtml .

The TF white paper can be generated from the manual with xslt script white.xsl using a xslt processor. The white paper is linked under name TFmanual.html . The latest version of the complete manual is linked under name TFManual_en.xhtml . Slides only work from a file with extension .xhtml in Firefox.

TF config files

Currently, TermFactory config files are held in directory $TF_HOME/etc , where $TF_HOME stands for TermFactory installation directory, in the cases of web services, the webservice root. Set environment variable TF_HOME (in /etc/environment or equivalent) to point to the TF installation directory. The files in this directory so far are

.:
apache2 apache virtual server settings directory
links.n3 jena location mapping file used in TermFactory XHTML hyperlink redirection
location-mapping.n3 jena location mapping file used in TermFactory URI redirection
log4j.properties log4j settings for TF commandline scripts
logging.properties java logging settings for TF commandline scripts
tbx-mapping.rdf TBX conversion mapping file
ont-policy.rdf Jena ontology policy for TF
owlprefix.sparql this file is for pellet4tf -D to map TF prefixes to URLs
pellet.properties TF PELLET query engine settings
prefix.default.sparql TF default prefixes for queries
prefix.sparql TF prefixes for queries
templates XHTML output template directory
tf2xhtml.css XHTML entry default css file
tf2xhtml.js XHTML entry default javascript file
tf.default.properties TF default option settings
tfd.sparql sparql DESCRIBE query for class instances by pattern
tf.properties TF option settings
tfp.sparql sparql DESCRIBE query file for PELLET engine
tfq.sparql TF generic pattern select query template (SPARQL engine)
tfs.sparql sparql DESCRIBE query file for SPARQL engine
tomcat tomcat directory settings
wn WordNet settings directory
xhtml xhtml module directory
xhtml1-strict.dtd xhtml document type definition
./apache2:
repos.php php directory index script
sites-enabled apache2 enabled sites directory
./apache2/sites-enabled:
000-default apache2 sample TF virtual server definition
./templates: XHTML output template directory listing
exp0.ttl lemma oriented template
exp.ttl lemma oriented template for expressions
ont0.ttl concept oriented template
ont.ttl concept oriented template for concepts
sem.ttl meaning oriented template
term0.ttl term oriented template
term.ttl term oriented template for terms
./tomcat:
log4j.properties log4j settings for TF services
logging.properties java logging settings for TF services
./wn:
fi.json Finnish localization file for WordNet
vunl2wnfi.n3 WordNet fi location mapping file
vunl2wn.n3 WordNet location mapping file
wn2vunl.n3 WordNet location mapping file
wnlinks.txt WordNet hyperlink mapping file
wnlmap.txt WordNet location mapping list file
./xhtml:
xhtml1-strict.dtd
xhtml-attribs-1.mod
xhtml-base-1.mod
xhtml-bdo-1.mod
xhtml-blkphras-1.mod
xhtml-blkpres-1.mod
xhtml-blkstruct-1.mod
xhtml-charent-1.mod
xhtml-csismap-1.mod
xhtml-datatypes-1.mod
xhtml-edit-1.mod
xhtml-events-1.mod
xhtml-form-1.mod
xhtml-framework-1.mod
xhtml-hypertext-1.mod
xhtml-image-1.mod
xhtml-inlphras-1.mod
xhtml-inlpres-1.mod
xhtml-inlstruct-1.mod
xhtml-inlstyle-1.mod
xhtml-lat1.ent
xhtml-link-1.mod
xhtml-list-1.mod
xhtml-meta-1.mod
xhtml-metaAttributes-1.mod
xhtml-object-1.mod
xhtml-param-1.mod
xhtml-pres-1.mod
xhtml-pres-1.mod.1
xhtml-qname-1.mod
xhtml-rdfa-1.dtd
xhtml-rdfa-model-1.mod
xhtml-ruby-1.mod
xhtml-script-1.mod
xhtml-special.ent
xhtml-ssismap-1.mod
xhtml-struct-1.mod
xhtml-style-1.mod
xhtml-symbol.ent
xhtml-table-1.mod
xhtml-target-1.mod
xhtml-text-1.mod

The Jena policy file etc/ont-policy.rdf is a facility for mapping fully specified ontology uris to alternative urls. A TF site uses this file to list the ontologies it maintains. It also contains version information. The uri mappings in this ontology policy file are applied before prefix mappings. (Jena rdf location mappings are looked up for other non-ontology rdf files too, including rdf-format settings files.) If models you get are not those you wanted, there are a lot of places to check: webserver mappings, web container (tomcat) mappings, Jena ont-policy (both mappings and caching policy), Jena location mappings, TF_NOTRY option.

TF command line script option verbose turns on logging in a fixed set of classes. For more control over log messages, the files etc/logging.properties and etc/log4f.properties hold logging settings for TF command line scripts in io/script directory. The command line scripts get the location of the configuration file from java properties, see the scripts in io/script. The locations of the log4j.properties , logging.properties and pellet.properties files can be set with Java properties log4j.configuration , java.util.logging.config.file and pellet.configuration , respectively. These properties can be set as -D java command line options to the java runtime start command. Example:

TF_HOME=/home/lcarlson/Data/CF/TF export JAVA_OPTS=" -Dpellet.configuration=file:$TF_HOME/etc/pellet.properties"

Another pair of logging property files in etc/tomcat/ get included in the service archives when the services are built. Logging settings on installed services and web applications can be changed at runtime by editing copies of these property files at root level of the service aar (for services) and TermFactory.war/WEB-INF/classes (for TermFactory webapp).

Pellet and Jena mostly use java logging. TF code mostly uses the log4j logger. The log4j.properties file must be on the java classpath.

TF properties

TF options are by default read from TF property file $TF_HOME/etc/tf.properties . The location of the property file can be overridden with java property tf.configuration . For instance, command line scripts can use etc/tf.default.properties by adding to the java command line a parameter like -Dtf.configuration=${TF_HOME}/etc/tf.default.properties .

Show/hide TF properties
# TF PROPERTIES ############### # Default TF_HOME # TF_HOME = ""; # Default site url for this site # TF_SITE_PREFIX = http://tfs.cc # Default location of TF properties file # #TF_PROPERTIES = file:etc/tf.properties # TermFactory default ontology policy URL # #TF_ONT_POLICY = file:etc/ont-policy.rdf # Default list of TF location mapping files delimited by \n. A url ending in .txt is a file containing such a listing. Lines starting with # ignored. Examples: # # first url\n # listing.txt\n # last url TF_LOCATION_MAPPING = file:etc/location-mapping.n3 # TermFactory default database connection URL # # TF_DBCONNECTION_URL = jdbc:mysql://localhost/tfdb # Default list of repositories to search for TF query engine. A url ending in .txt is a file containing such a listing. Lines starting with # ignored. Examples: # # first url\n\ # listing.txt\n # last url # TF_TDB_DIR = file:tdb # TermFactory default tdb dataset assembler file # # TF_TDB_ASSEMBLER = file:etc/tf-assembler.ttl TF_REPOS = http://tfs.cc/owl/TFS.owl # Default site list for TF sites. A url ending in .txt is a file containing such a listing. Lines starting with # ignored. Examples: # # first url\n\ # listing.txt\n # last url TF_SITES = http://localhost:8080/TermFactory/query # TermFactory describe query files as URLs to sparql files # TF_PELLET_QUERY = file:etc/tfp.sparql TF_SPARQL_QUERY = file:etc/tfs.sparql # TF DESCRIBE query recursion depth # Allowed values: nonnegative integers # Default is 0 (no recursion) TF_DESCRIBE_DEPTH = 2 # TF DESCRIBE query classification depth # Allowed values: nonnegative integers or -1 # Default is -1 (unlimited depth) TF_DESCRIBE_UP = 2 TF_DESCRIBE_DOWN = 2 # PelletClassify4TF classification depth # Allowed values: nonnegative integers or -1 # Default is -1 (unlimited depth) TF_CLASSIFY_UP = 2 TF_CLASSIFY_DOWN = 2 # Invert flag in pellet4TF classify TF_INVERT = false # TermFactory output options TF_OUTPUT_FORMAT = TURTLE TF_OUTPUT_ENCODING = UTF-8 # TermFactory version checking on (default true) # #TF_VERSIONINFO = true # TermFactory default sparql query prefixes # #TF_PREFIX_SPARQL = file:etc/prefix.sparql TF_QUERYDB = true # TermFactory default cacheDB setting (false) # TF_CACHEDB = false # TermFactory default queryFS setting (true) # TF_QUERYFS = true # TermFactory default cacheFS setting (false) # TF_CACHEFS = false # TermFactory default notry setting (false) # # TF_NOTRY = false # TermFactory default nowait setting (false) # # TF_NOWAIT = false # XHTML default locals setting # # TF_READALL = true # default readAll setting (true) # TF_LOCALS = file:owl/ # XHTML default links setting # TF_LINKS = file:etc/links.n3 # XHTML default location mapping max (not set by default) # # TF_HOPS = 5 # XHTML default lang setting (not set by default) # # TF_LANG = en # default XHTML schema setting (not set by default) # # TF_SCHEMA = http://tfs.cc/owl/TFS.owl # default Stacked engine axioms setting # TF_AXIOMS = http://tfs.cc/owl/TFS.owl # TF default webdav dir # TF_INDEX_URL = http://localhost/webdav/ # TBX default mapping # TF_TBX_MAPPING = file:etc/tbx-mapping.rdf # XHTML default template # TF_TEMPLATE = file:etc/templates/ont.ttl # XHTML default template root (not set by default) # # TF_ROOT = ont:Concept # TF default query engine # TF_DEFAULT_ENGINE = SPARQL

Locations are given in the configuration file as URLs or paths. As a special case, urls or paths of form file://foo or just //foo are resolved against the environment variable TF_HOME if that is set; else against configuration file property TF_HOME. If TF_HOME is not set or "", //foo resolves to current directory, i.e. ./foo . The location of the pellet configuration file can also be overridden by setting java property pellet.configuration to point to another location.

The TF_REPOS option gives the default data set to search by the site query service. This option is used when the query does not list the dataset explicitly. The TF_REPOS parameter is either a url to a list of repositories, or a hard-coded list of uris delimited by "\n".

The TF_SITES option gives the default sites to broadcast a query to. The TF_SITES parameter is either a url to a list of repositories, or a hard-coded list of uris delimited by "\n".

The TF DESCRIBE query classification depth is controlled with options TF_DESCRIBE_UP and TF_DESCRIBE_DOWN . The PelletClassify4TF taxonomy depth is controlled by TF_CLASSIFY_UP and TF_CLASSIFY_DOWN . The TF_INVERT flag controls the direction of classification triples in RDF and JSON output. The default (false) is to write triples bottom up (from particular to general), using properties rdfs:subClassOf and rdf:type . The invert option writes triples directed top down (from general to particular), using TF specific meta properties meta:hasSubClass and meta:instance . There is an analogous punned inverse of rdfs:subPropertyOf called meta:hasSubProperty for listing subproperty relationships top down.

The default output format is used by the TF query engine. The default output character encoding UTF-8 can be changed to UTF-16. Pellet4TF tries to guess the input character encoding of sparql files from the beginning of the file.

TF keeps its ontology policy file ont-policy.rdf at TF_HOME/etc . This file can be used to redirect ontology URL's to local files on a long term basis. The location mappings defined in ont-policy.rdf are included into the TF retry location mappings.

Boolean setting TF_VERSIONINFO turns on or off version checking of models in the database cache against version info given in the ontology policy file ont-policy.rdf . When true (the default), the TF location mapping retry facility warns about a model whose versionInfo differs from versionInfo associated to the same URI in the ont-policy file.

File URL TF_PREFIX_SPARQL specifies the file name of a sparql query for pellet4tf to copy default namespace prefixes from in order to simplify writing TF sparql queries. This option allows using conventional qualified names of TF URIs in sparql queries. For instance, to describe the country China, one can just submit the query DESCRIBE ont1:China . (One can also declare prefixes in the query prolog, or use full URLs like <http://tfs.cc/ont1/China> .) The same abbreviatory convenience applies to the pellet4tf describe-resource command line option -D .

Boolean option TF_QUERYDB tells the retry facility whether to query the default database cache for ontologies. (The default is true). Boolean option TF_CACHEDB tells whether the retry facility should cache ontologies in the default database cache. (The default is false). The configured settings can be overridden by command line scripts.

TF_TBX_MAPPING points to the RDF document containing the mapping rules used in conversions between TF and the TBX format.

TF_TEMPLATE allows setting the default template to write XHTML entries .

TF_ROOT allows setting the default root filter (one or more instances/classes) to include in entry XHTML entries .

TermFactory schema (TFS.owl)

The url of the current version of the TermFactory OWL schema is http://tfs.cc/owl/TFS.owl . It can be downloaded from the url (for instance, with a browser or an OWL editor like Protege ).

Listing of TF schema vocabulary

The following table is a listing of the TF schema vocabulary with comments. It was generated with pellet query c-comment.sparql using command line pellet4tf query -o RDF/XML -q c-comment.sparql ../owl/TFS.owl > comments.owl and converted to html with script comment.xsl using command line xsltproc comment.xsl comments.owl > comments.html . These commands can also be run with shell script doc/comment.sh .

Show/hide TF schema vocabulary

List of converted vocabularies

4m.owl
the 4M project ontology (ca 2000 entities) converted to TF (a one-time conversion) using a purpose-built Java Jena OWL converter FM2TF.java .
space.owl
the mobilite space ontology (ca 400 classes) downloaded from web and imported using a bridge ontology
yso-raken.owl
a library taxonomy of Finnish construction terms (ca 70 concepts) extracted from FinnOnto thesaurus ontology using sesame rdf
paperi.owl
A fi-de-en paper industry terminology converted from MultiTerm format to TF over TBX (ca 200 concepts)
rakli8.owl
A fi-en building maintenance glossary (ca 100 concepts). converted from PDF via TBX
ECHA-glossary.owl
A multilingual glossary of EU chemical terms.(ca 200 concepts). converted from csv exports of excel sheets using a dedicated perl script.
kyamk.owl
A multilingual glossary of customs terms.collected by KYAMK (ca 300 concepts). converted from csv exports of excel sheets using a dedicated perl script.
puls.owl
The PULS epidemics ontology converted from lisp sources using lisp and perl into Turtle format (ca 3000 terms).
puls-locations.owl
The PULS country ontology converted from lisp source using lisp into Turtle format (ca 1700 terms).
bio.owl
The BioCaster multilingual epidemics ontology converted from owl rdf/xml source using xslt (ca 8000 terms).
cml.owl
An en-fi-ru EU project ontology converted from html source into TF using xslt (ca 5000 terms).
sost.owl
An en-fi-ru welfare vocabulary converted from Windows Word content xml into TF using xslt (ca 800 concepts, 7000 terms).
hs.owl
Another multilingual welfare vocabulary converted from fixed-width tabular format through TBX into TF using Perl and xslt (ca 1000 concepts, 9000 terms).
TFCtry-??.owl
Iso 3199 country codes and names converted from Wikipedia using xslt and TF. en-fi-ru-zh, ca 250 codes per language, 2000 terms, 20K triples
TFLang-??.owl
Iso 639 language codes plus English and vernacular language names converted from Wikipedia using xslt and TF. 185 codes, 368 terms, ca 11K triples.
TFLang-??.owl
TermFactory subject field classification converted from MSWord over OpenOffice wiki text export and Perl. fi-en-zh, 97 headings, 591 subheadings, 4570 terms, ca 11K triples.

TermFactory sample ontologies

These links point to sample TermFactory ontologies. They import the TF schema plus possibly other ontologies. They can be downloaded and/or inspected with OWL tools like Protege .

The sample ontologies are copyrighted and confidential.

TermFactory ontology
The TermFactory ontology is an example of a layered ontology that contains information about expressions relevant for automatic natural language processing.
KYAMK ontology
The KYAMK glossary is a raw import of a recent multilingual glossary of customs terms.

ContentFactory wikipages

The ContentFactory wiki pages for TermFactory are at

http://sysdb.cs.helsinki.fi/~tkt_plus/twiki/bin/view/ContentFactory/TermFactory

TermFactory source code repository

The following URL points to the TermFactory Subversion code repository. Currently, it contains all of the TermFactory source code and documentation. Access requires username and password.

https://svn.it.helsinki.fi/repos/termfactory/

External libraries (Jena, Pellet) are kept separately and included into the download through svn:externals definitions.

Currently, the root structure of the code base is

cnv doc etc fe io owl txt viz ws

More specifically:

cnv
conversions between TF and other formats
doc
TF manual and other external documentation of TF
etc
TF setting files
fe
TF front-end code
io
TF back-end i/o and the query engine
owl
TF ontologies
txt
TF internal developer notes
viz
TF graphical visualizer
ws
TF web services

Version history

  • version 2.6 31.05.12 gate service
  • version 2.5 01.01.12, 12.02.12 pellet 2.3.0, tdb database
  • version 2.4 24.12.11 TF API docs 2.4
  • version 2.3 12.12.11 xhtml input templates
  • version 2.2 15.11.11 xhtml output templates
  • version 2.1 18.08.11 convert4tf, tfs.cc server version
  • version 2.0 14.12.10 rewrite as service, virtual ontologies
  • version 1.9 29.11.10 index services, EditServlet and QueryForm v. 1.9., TFStrict
  • version 1.8 30.10.10 Separated TFProp, TFDom, TFCtry, TFLang, TF profiles
  • version 1.7 15.09.10 TF for WordNet, separated TFL10n
  • version 1.6 10.06.10 June 2010 demo
  • version 1.5 19.05.10 TF self documentation by reflection
  • version 1.4 13.04.10 TF profiles, front end design
  • version 1.3 16.03.10 TF edit services, repository network
  • version 1.2 15.01.10 TF webapp: query service with retry of location mappings; 20.03.10 TFS v 1.2
  • version 1.1 15.12.09 query imports using query URLs
  • version 1.0 08.11.09 tf io merged, 21.11.09 TF backend v 1.0 completed, 07.12.09 TF2XML format
  • version 0.9 28.10.09 describe queries
  • version 0.8 21.09.09 ontology matching
  • version 0.7 05.04.09 TF3 format, more conversions
  • version 0.6 03.04.09 June 2009 demo
  • version 0.5 03.04.09 White paper
  • version 0.4 01.04.09 TF query import specification
  • version 0.3 25.02.09 TF schema in OWL DL, Protege 4 + Fact++, Pellet check ok
  • version 0.2 02.02.09 TF schema, TermFactory, Protege 3.3.1
  • version 0.1 01.11.08 TF schema, ECHA demos
  • version 0.0 07.02.07 rakli8.owl converts and loads

Some bibliography

Table of contents

  1. Introduction
    1. What TF TermFactory is all about
    2. The TF architecture
    3. TermFactory philosophy
    4. Ontologies
      1. RDF
      2. RDFS
      3. OWL
        1. OWL semantics
        2. OWL properties
        3. OWL versions and profiles
        4. Instances, classes, properties and roles
  2. The TermFactory ontology schema
    1. TF top ontology
      1. TF algebra
      2. TF ontology subsets
    2. TermFactory properties
      1. Term properties
      2. Descriptions
        1. Definitions
      3. Concept properties
        1. Connotations
      4. Expression properties
      5. Other properties
    3. TF linguistics
      1. Morphology in TF
        1. Inflection
        2. Derivation
    4. TF natural language semantics (TFSem.owl)
      1. Verbs
      2. Nouns
      3. Adjectives
      4. Adverbs
      5. Polysemy
      6. Top ontologies
        1. WordNet
        2. SUMO
        3. FinnOnto
    5. TF localisation
      1. TF JSON content localization files
      2. The localize script
      3. Localization queries
        1. How localization works
        2. How to add localizations
  3. Ontologies in the Web
    1. Reasoning in TF
      1. Property inheritance in OWL
      2. Rules
        1. Property inheritance with SWRL
        2. Role maps
      3. Querying as reasoning
        1. SPARQL
        2. SPARQL Update
    2. Ontology imports
      1. Bridge ontologies
      2. Importing by query
      3. TF DESCRIBE queries
    3. Naming resources in TF
      1. Descriptive identifiers
      2. TermFactory URIs
        1. Hash vs.slash vocabularies
        2. URI abbreviations
        3. TF3 encoding
      3. TF namespaces
      4. Resources vs. literals
        1. Formatted text and the XMLLiteral datatype
    4. Redirection in TermFactory
      1. TF location mappings
        1. TF built in pseudo schemes
        2. User defined pseudo schemes
        3. TF filename mapping conventions
      2. apache2 URL rewriting
      3. Tomcat URL rewriting
  4. Formats
    1. Semantic Web file formats
    2. TF entries
      1. Difference beween terminology and lexicography
    3. TF formats
      1. XHTML
        1. XHTML templates
        2. XHTML entry schema
        3. XHTML active ontology
        4. XHTML original and edits
        5. XHTML localization
        6. XHTML links
        7. XHTML CSS stylesheet
        8. XHTML javascript
        9. XHTML headers
      2. TF2XML deprecated v. 1.9
      3. TF3
      4. JSON
      5. TBX
      6. LMF
      7. MultiTerm
  5. Conversion
    1. TBX to TF conversion
    2. TF to TBX conversion
    3. TF to cparse conversion
    4. MultiTerm2xhtml and MultiTerm2FO
    5. WordNet conversion
    6. Ad hoc conversion to TF
    7. TF schema profiles
      1. TF Lite
      2. Separation axioms
    8. TF profile conversion
      1. Virtual ontologies
  6. Repositories
    1. Persistent repositories
    2. Revision control
      1. OWL file formats
      2. Normal forms
      3. Entry normal form
      4. TF triple format
      5. TF entry diff
      6. Date and user
    3. Efficiency
      1. Caching
        1. DB cache
        2. Directory index
        3. Building a URL directory index
        4. Building a class directory index
        5. CMS cache
        6. Tomcat native library
      2. Indexing
        1. DB index
        2. Repository directory index
        3. TF webdav indexing
        4. Separate index
        5. Indexing and ontology factoring
      3. Mirroring of ontology documents
      4. Content negotiation for entries
      5. Cache updates
  7. Tools
    1. TF tools
      1. TF tool workflows
      2. Retry utility
      3. Rewrite utility
      4. The TF reasoner Pellet4TF
      5. edit4tf
      6. index4tf
    2. Editors
      1. XML editors
        1. XMLmind XML editor
      2. RDF/OWL editors
      3. Swoop
      4. Protege
        1. Protege plugins
        2. Collaborative Protege
        3. ProSE
      5. TopBraid Suite
      6. NeON Toolkit
        1. NeOn DIG plugin
        2. OwlSight
        3. Ontology Browser
        4. OntoTrack
        5. OntoWiki
      7. Comparing OWL editors for TF
    3. Validators
      1. W3C RDF validation service
      2. WonderWeb OWL validator
      3. Validation with reasoners
    4. Reasoners
      1. Jena ARQ and SPARQL query language
      2. Fact++
      3. Pellet
        1. Pellet Internal API
        2. Manchester OWLAPI
        3. Jena API
        4. Advanced Pellet programming
      4. HermiT
      5. OWLIM
    5. Visualizers
      1. The TermFactory Visualizer TFVisu
      2. Bramble
    6. Software libraries
      1. OpenRDF Sesame
      2. Jena Library
      3. OWL API Library
  8. Services
    1. TF web services concept
    2. TF API
      1. The TF web services
        1. TFServices
        2. QueryService
          1. Relayed queries
          2. Broadcast queries
        3. EditService
        4. IndexService
        5. GateService
        6. SparqlService
        7. EditForm servlet
      2. TF web application
        1. QueryForm servlet
          1. QueryForm parameters
          2. Sample TF query string parameters
        2. EditForm servlet
          1. EditForm parameters
      3. Logging
  9. User interfaces
    1. The mockup
      1. TermFactory Search
      2. TermFactory View
      3. TermFactory Forum
      4. TermFactory Wiki
    2. The actual front end
      1. Login
      2. Search
        1. Query form
      3. Edit
        1. Editor
        2. Edit form
        3. CKEditor
        4. TF autocomplete lists
        5. TF input templates
        6. TF front end localization
      4. Wiki
      5. Comment
        1. Disqus
  10. Workflows
    1. Collaborative terminology work vs. traditional terminology
    2. Ontology work
      1. Extracting
        1. Extracting subsets using Jena and Pellet
        2. Extracting subsets from YSO using the Sesame RDF library
      2. Merging
        1. Ontology conversion
        2. Ontology merging
      3. Matching
        1. Ontology extraction
        2. Matching ontologies Koneistettu ontologiatyö
        3. Size and speed
  11. Survey
    1. Collaborative terminology work
      1. Moderation
        1. Supervisor moderation
        2. User moderation
      2. Voting
    2. Platforms
      1. MediaWiki
      2. Plone
      3. Drupal
      4. Moodle
      5. Alfresco
      6. Confluence
      7. Sharepoint
      8. Lotus Notes
    3. Web editing
      1. XForms
      2. X(HT)ML web editing
        1. Amaya
      3. Wiki editing
      4. HTML 5
  12. Applications
    1. TF in CF
    2. TF in MOLTO
      1. Molto Translation Tools
      2. Equivalents editor
      3. Done by Jan 2012
      4. Done by Jan 2012
      5. Done by Jan 2012
      6. These tasks to do in 2012
    3. TF resources
      1. TermFactory Manual
      2. TF config files
      3. TermFactory schema (TFS.owl)
      4. Listing of TF schema vocabulary
      5. List of converted vocabularies
      6. TermFactory sample ontologies
      7. ContentFactory wikipages
      8. TermFactory source code repository
    4. Version history
    5. Some bibliography
    6. Table of contents
    7. Subject index
    8. Figure Index

    Subject index

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


    A
    active ontology
    adjective
    architecture
    apache2 url rewriting
    axioms (stacked query engine)
    B
    TF back end
    BPEL
    broadcast queries
    C
    caching
    TF CKEditor
    compact URI (CURIE)
    conversion
    connotations
    cparse
    D
    definition
    DESCRIBE queries
    descriptive identifiers
    diffing entries
    repo directory index
    dialogue system
    E
    editors
    TF edit API
    web editing
    edit4tf utility
    editing anonymous items
    edit service
    efficiency
    encoding
    TF entries
    equivalents editor
    TF Compact
    F
    FinnOnto project
    formats
    formatted text
    TF Full
    G
    gate service
    H
    hash (fragment) redirection
    HermiT reasoner
    ontology hell
    home ontology
    I
    importing
    TF indexing service
    inheritance
    input templates

    J
    json format
    json localization files
    K
    L
    libraries
    TF libraries
    LISA
    TF Lite
    TF link mappings
    TF location mappings
    TF localisation
    TF json localisation file format
    login to TF
    M
    materialization (deductive closure)
    EU MOLTO project
    morphology
    MultiTerm
    N
    NeON Toolkit
    NLP interface
    normal forms
    O
    ontology
    ontology work
    ontology policy
    TF options
    original ontology
    OWL
    P
    Pellet
    Pellet4TF
    persistence
    polysemy
    TF schema profiles
    TF profile conversion
    Protege
    TF pseudo schemes
    punning

    Q
    query engine
    querying
    query imports
    query form
    query service
    R
    RDF
    RDFS
    realization (type inference)
    reasoner
    redirection of web addresses
    references to literature
    TF reflection
    relayed queries
    TF resource directory conventions
    retry utility
    rewrite utility
    revision control
    roles
    rules
    S
    TF top ontology
    TF general language semantics
    serialisation
    TF site
    sparql endpoint
    SUMO
    Stacked query engine
    standards
    T
    TBX
    XHTML entry templates
    TF3 format
    TF3 encoding
    TF schema (TFS.owl)
    TF tools summary
    Tomcat URL rewriting
    top ontologies
    U
    UDDI
    URI conventions
    V
    validator
    TF schema vocabulary
    virtual TF ontologies
    visualizers
    versioning

    W
    WSDL
    TF web services API
    WordNet
    WordNet bridge
    X
    XHTML format
    XMLLiteral datatype
    XMLmind
    Y
    YSO
    Z
    Zope

    Figure Index

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


    A
    architecture
    sample ARQ reasoner log
    B
    C
    Caching
    Class index
    COGKS architecture
    Confluence glossary
    Confluence karma
    Confluence voting
    TF Full contrastive terms
    TF Lite contrastive terms
    TF conversion
    D
    DESCRIBE query
    E
    TF edit form
    TF editable attribute
    entry query script
    F
    G
    H
    I
    imports
    import query
    import result
    input template
    input template definition

    J
    K
    L

    Localized TF entry
    M
    Sample ontology match query log
    Multilingual Desktop
    TF front end mockup
    TF screen mockups
    MOLTO design
    Moodle Glossary
    N
    O
    TF options
    P
    collaborative Protege
    sample Pellet log

    Q
    TF query form
    R
    RDF/XML
    TF repository network
    TF repository design
    S
    SPARQL query
    TF services overview
    TF subset graph
    system design
    T
    TMF metamodel
    TBX sample
    TF3 sample
    TF terms"
    TF tools
    U
    URL index
    TF URL rewriting
    V
    visualizer
    TF schema vocabulary

    W
    TF Axis2 web services
    TF web application
    WordNet sample
    X
    XHTML sample
    Y
    Z

    Valid XHTML 1.0!

    (End of file)