TermFactory slides
Slides
Title
工厂

TermFactory Manual

© Lauri Carlson 2007-2014

TermFactory
Helsinki, May 1, 2014

Introduction

This document describes the TermFactory approach and architecture. With suitable styling, it serves as a white paper, a status report, a journal, and a manual. Style dev shows all for TF developers. Of the other styles listed above, style slide is a slideshow. Style intro is for those who just want to read about TermFactory. Style guide shows a front end user guide. Style user shows the back end tools. Style admin shows an administrator's manual.

The sentence starts the introduction. It skims some main aspects of TermFactory.

The sentence starts the book. It expounds the theory of TermFactory.

This sentence starts the user manual. The user manual explains what words and terms and other terminology content look like in a TermFactory term ontology. Then it gives general advice about how to maintain TF terms. It contains a chapter on commandline tools for offline terminology work.

This sentence starts the user guide. The guide explains the use of TermFactory web toolkit. It contains a chapter on the TF web toolkit and one on TermFactory workflows.

This sentence starts the admin manual. It includes information needed to set up and administer a TermFactory site.

This sentence starts the full manual. It includes everything.

Text in grey supplies background information - Not Invented Here.

Text in blue is blueprint - what is or was in the plans.

Text in green is the green paper - what might be nice to have.

Text in red is deprecated - for historical interest.

About

What TF TermFactory is all about

TermFactory is an architecture and a workflow for Semantic Web based, multilingual, collaborative terminology work. What this means in practice is that it applies Semantic Web and other document and language technology standards to the representation of multilingual special language terms and the related concepts, and provides a plan for how such terminologies can be collected, updated, and agreed about by professionals, not only terminology professionals, all over the globe, during their everyday work on virtual work platforms over the web. As a whole, TF could be termed a semantic web frameworkfor multilingual terminology work.

TF provides

  • ontology and terminology formats
  • format conversions
  • repositories
  • query and edit tools
  • web services
  • collaboration tools

for people to work on terms jointly or separately, building on the results of the work of others, while maintaining quality and consistency between the different contributions.

There is not much that TermFactory invents out of whole cloth. yet the totality is novel, in that there is no terminology management solution in existence that comes even close to doing what TF promises to do. Being a combination of existing and tried technologies, TF is not science fiction. Yet it is complex enough a totality that it needs careful planning and thoughtful implementation. This evolving document is a report of that planning and implementation process. It is to be complemented with more easily digestible guides as the product matures.

A model implementation of a TermFactory community platform has been built. We try to keep the design as free of platform specific detail as possible. The main research focus is on making Semantic Web technology useful for global multilingual terminology work.

As for workflow, TF tries out these ideas in an experimental terminology network, and by so doing devises a guide of best practices as to how the collaborative distributed work pattern can be made to produce useful results with the least hassle. There are many practical questions to solve, not only technical ones but ones having to do with community building, access rights, authentication, division of labor, organisation and timing of different phases of work, allocation of authority, intellectual property rights, etc. Many of these issues it shares with other collaborative work processes, so TF does not have to invent everything, but rather survey and choose suitable solutions.

One is perhaps tempted to ask, what is new here, what is the gain from TF? There have been a lot of terminology data storage solutions before. There are more efficient uses of relational databases than current persistent ontology repositories. There is a lot of standards and tools for terminological data management, including XML based ones like TBX. There are old and well-tried query and reasoning tools, including SQL and Prolog. What is there to gain from using another burgeoning technology instead of the old tricks? Is it just the novelty value?

The expected benefits of as compared to the current state of the art terminology tools include

Benefits
The benefits from TF
  • Openness and conformance
    • With ontology standards and tools, both conceptual and linguistic content can be globally identified and mechanically validated.
  • Flexible reuse of content
    • Allow third party ontologies to coexist and and (co)develop on separate sites.
    • Different terminology styles can coexist and and (co)develop the same repositories.
  • Ease of implementation and deployment
    • Contents are usable by third party ontology tools
    • Help divide and conquer big ontologies

Ontology tools are not about runtime efficiency in (say) online web applications. They save human work and make possible the management of much larger collections of terminological data than before. Ontology work happens in the background.

The TF architecture

The TF architecture is not embodied in one software package, but an array of standards and software, both web and standalone tools, that allow different actors in different roles to collaborate to produce a shared resource. This shared resource is a distributed multi-domain special language ontology of multilingual terminology. It can be used to organise content and standardise communication in global multilingual organisations, enhance exchange of ideas and innovations across a multinational workforce, or facilitate understanding and support education across language barriers. The core of the system is formed by a hierarchy of Semantic Web ontologies served by a network of interconnected repositories. The human interfaces to the network are composed of online web collaborative work platforms and offline professional terminology and ontology tools.

TF pyramid

The TF pyramid is not necessarily cuneiform in reality, it can be more of a cumulus cloud. It is drawn with a triangle here just because there are many more special concepts than general ones, and many more people working at the lower rungs than at the top. The divisions of the cone depict both the composite structure of the content and the division of work between experts of different things.

Show/hide TF pyramid

TF Pyramid

TermFactory philosophy

From a philosophical perspective, all communication, including multilingual communication, involves translation, that is, conversion from one format to another. TF is specifically about interlingual translation of terms: internationalization of local content to make it globally shared, and localization of globally shared content back home.

A familiar notion that helps explain TermFactory philosophy is this (Carlson 1989). We choose names or addresses we use to identify persons (places) depending on the size of the field of search. In the middle of a conversation, we point or use pronouns. In talk among family and friends, we use first names and nicknames. Within a wider circle, full names are used. In official contexts, some ID is required. When there are none, a description is offered. In each context we use the name (address) that is most convenient among those that sufficiently identify the object. This also covers naming across cultures and languages: we switch to another code that works best in that context. Though TF primarily deals with multilingual terms, it is really about globalisation and localization of names of any kind.

From this point of view, the simple answer to what (TF) TermFactory is about is this: it is a tool for facilitating choice of names and addresses to suit with changes of context. Starting with a local ordinary name for a thing in some language, it allows narrowing down the intended meaning to a special domain concept that has a global identifier, a URI, or better yet, an international identifier, IRI. This global concept is in turn localized to terms and expressions in another language or culture. The facilities offered by TermFactory for interlingual intercultural communication conform to classical terminology theory, but are not confined to terminology only. Considered as an abstract machine, TermFactory creates a complex of string to string mappings, mediated by a semantic network expressed in description logic.

Over the past few years, one of the most important shifts in the digital world has been the move from the wide-open Web to semiclosed platforms that use the Internet for transport but not the browser for display. It’s driven primarily by the rise of the iPhone model of mobile computing, and it’s a world Google can’t crawl, one where HTML doesn’t rule. And it’s the world that consumers are increasingly choosing, not because they’re rejecting the idea of the Web but because these dedicated platforms often just work better or fit better into their lives (the screen comes to them, they don’t have to go to the screen). The fact that it’s easier for companies to make money on these platforms only cements the trend. Producers and consumers agree: The Web is not the culmination of the digital revolution. (Quote from Wired Aug 2010)

The brand new mobile world goes for icons. People are served menus of global icons, to avoid localization. TF swims against the mainstream: it is a safe haven for localized strings. his is not a trivial distinction

In a more mathematical vein: two category theoretic insights underlie much of TF.

  1. Translations are mappings, morphisms in a category of languages. Globalisation, or interlingual translation of terms through the composition of internationalization followed by localization (i18n o L10n), uses the interlingua of language independent concepts as a category theoretic limit (universal element) of that category.
  2. Universal elements are good for space efficiency and manipulation. Limits in a category use one thing to represent many, taking out the slack from redundant mappings.

Space (thereby maintenance) efficiency is the motivation for having global names in the first place. That is also the ultimate motivation for terminology theory for distinguishing concepts from terms. The TF distinction between terms and expressions is its dual, motivated by multilinguality.

All tasks in TF involve format conversion, and many tasks involve globalisation. URIs are global. Expressions in a (sub)language are local. TF tries to make them global, by giving them uris. At the same time, it goes to no end of trouble to make that global content locally usable. It is efficient and convenient in a given context to name anything with the shortest name that is unique in it. When the context is large, the names get longer. TF aims to make both optimisations possible at once: to globalize, and to localize at will.

Globality is the motivation of the Semantic Web. Its weakness is the downside of globalisation, namely loss of local (time) efficiency. Going through limit is space/resource efficient, like a tree is resource efficient compared to a graph, but it is time consuming for the same reason. Having direct routes is faster if the cost is no object. In the web, fast traffic is peer to peer . In the short run, nobody has the patience to look for long uris, and nobody cares to check if a given local thing already exists globally. Result: a lot of duplication, proliferation of different URIs for the same thing. The homonym problem globalisation sets out to solve is replaced by an equally untractable synonym problem. In a word, ontology hell .

The TermFactory architecture is meant to address this problem. (It cannot solve the problem at large, just think of the scale. It is enough if TF helps tame it in smaller circles.) The solution is something like this: before inventing another global concept associated to a local name, look up in TF what there is already under that local name, and borrow or subclass one of them. If not, share your innovation.

It is important to observe that there is relatively little in the mechanism of TF that is specifically hardcoded about multilingual terms. That is all in the data (the ontologies). Mostly, TF is a bunch of Semantic Web mechanisms to do mappings between strings using description logic based techniques.

Another category theory idea is the Leibnizian duality: an individual "is" the set of all of its properties. Transposed to resources and ontologies, an ontology resource "is" the ontology of all triples about it: the ontology which distinguishes it from other resources. This insight can be seen as the underlying motivation of the Semantic Web addressing orthodoxy , by which a resource URI should resolve to an ontology that defines that resource.

The resource/ontology duality can be relative to some projection, a universe of discourse or vocabulary, in which case the individuating ontology can be smaller. Most ontologies are too diverse to individuate just one resource, they are about many resources, but the TF notion of an entry approximates the idea of an individuating ontology.

Applying the resource/ontology duality twice, we get back to the original resource. If we loosen the projection on the way, we might get from a resource to a set of resources indistinguishable from it through the projection. Applying the resource/ontology duality twice, we map resources to resources. If the projection is localization, we get from a resource to its localization vocabulary, and from that ontology to the corresponding resources in it.This thought is put into application further below .

Ontologies

An ontology, in the Semantic Web sense of the word, is a collection of semantic descriptions of concepts in a formal language explicit enough to be processed by a machine.

Ontologies are a direct descendants of the semantic networks of 70's artificial intelligence. AI systems used semantic networks to describe the world to a machine so that the machine could behave intelligently. There was nothing wrong with the idea in itself, it just turned out to be unfeasible to describe enough of the world to make the machines behave intelligently enough, and the AI hype died out.

The idea got a new lease of life by the hype surrounding the Semantic Web, a new generation Internet where intelligent machines are able to understand and process information produced by and for people. It is supposed to upgrade the current human-interactive Web 2.0 to a human-machine interactive Web 3.0 .

At least for the time being, machines are worse than people at guessing meaning (IBM:s success in TV trivia notwithstanding). To make information accessible to them, people have to be more explicit about what they mean than they have to when communicating to other humans. Specifically, meanings must be annotated, marked up using some metalanguage in web documents for machines to read.

The first step in the 90's toward man-machine readability was XML, the eXtensible Markup Language to make document structure explicit to people and machines. XML gives documents a treelike structure but no particular semantics; the meaning, if any, is up to the user to provide. The ontology languages want to also fix meaning.

The base language for the Semantic Web is the Resource Description Framework language RDF. It is a language for constructing semantic networks as labeled graphs, not just trees like XML. RDF is actually independent of XML, although RDF graphs are by default written in XML. The native "language" of RDF is statements in the form of subject-predicate-object triples, for instance (1)

ont:Fido rdf:type ont:Dog . ont:Dog rdfs:subClassOf ont:Pet . ont:Sue ont:owns ont:Fido .

saying that Fido is a dog, dogs are pets, and Sue owns Fido. RDF graphs are semantic networks built out of such triples.

In recent years, attention in Semantic Web activities has been concentrated on building an infrastructure for the semantic web out of RDF datastores. They are known under the catchwords of The Web of Data, or the Linked Data Cloud.

The web being global, RDF allows identifying the concepts that appear in the triples with global identifiers, called, depending of variety, URLs (universal resource locators), URIs (universal resource identifiers), IRIs (international resource identifiers) or URNs (universal resource names). For instance, the dog's name ont:Fido above is an abbreviation for the full TermFactory name of a particular globally unique dog, http://www.tfs.cc/ont/Fido . There are other Fidos in the world, but only one by this name. Global identifiers is one of the main advantages of RDF that make TermFactory possible. Using global identifiers (URLs, URIs, IRIs or URNs), concepts can be identified uniquely and traced back to their owners. In the above example, the meaning of the predicates rdf:type and rdf:subClassOf is fixed (and explained) by the W3 consortium. The rest of the named entities are owned and documented by TermFactory. Another site might have their own Sue and Fido or a different concept of ownership, identified by different URIs.

The more semantics an ontology language has, the more meaning it can express and the fewer statements it needs to express it. But then all the consequences are no longer explicitly stated, and to unpack what is only implied, an inference engine, or reasoner, is needed. For instance, from (1) a reasoner can infer that Fido is a pet, or that Sue owns a pet, though these facts are not explicitly stated in the ontology. The richer the language, the more it can express, but the harder its inference problem, so the trick in defining an ontology language is to find a useful balance between expressive power and tractability.

RDF

RDF is a language for labeled graphs. An RDF graph can be visualized as a binary graph made up of labeled nodes and arcs. A model is a wrapper around a graph, providing it with i/o and other conveniences. A SPARQL dataset is a collection of RDF graphs, consisting of a default graph and zero or more named graphs. A datastore is a dataset that can be updated with SPARQL update queries. Compare section on TF listings .the need to specify the context

Show/hide RDF graph

figure 1

Such an RDF graph consists of triples each consisting of two nodes connected by a binary labeled arc. The nodes can be labeled or blank (anonymous). An RDF graph need not be connected. RDF graphs can be named. The arc labels are called properties. RDF nodes are either resources or literals. Literals, for instance strings, numbers, or dates, have a label, but no URI and no properties. Resources may have a URI and properties. (They need not have either.) A resource with URI is a named resource, one without is an anonymous resource, alias blank (node). Blank nodes have fixed identity only inside the containing graph, so they need to travel with their containing graph. Constant RDF triples and graphs are completely self-sustained globally identified pieces of information that can be bartered between documents and ontologies. This is one of the strengths of RDF.

There is a large variety of formats to represent a RDF graph as text. Some are described in the section on TF formats.

Reification

RDF statements are identified by subject, predicate, and object. Statements are not resources, so they cannot have properties, except by way of introducing more triples. (It would have been a better idea to start with quads to start with, making RDF a form of modal logic. Compare NQuads. )

RDF has a standard way to allow assigning properties to a triple, namely, by way of statement reification. A RDF Statement is a quadruple of triples that identifies a triple and associates to it its predicate, subject, and object as properties. The following two are equivalent:

ont:Sue ont:owns ont:Fido . [ rdf:type rdf:Statement ; rdf:subject :Sue ; rdf:predicate ont:owns ; rdf:object ont:Fido ; meta:context <http://foo> . ]

The reification can have further properties, for instance administrative information or modalities.

Alternatives to standard RDF reification have been proposed allocating triple properties to individual resources inside the triple. Some use RDF as is, some propose extensions to it.

One harmless solution is predicate decomposition. Statement s P o is decomposed into two relations s S p . p T b (S inverse functional, T functional). A new resource p, uniquely determined by a, P and b, reifies the instance of P holding between a and b. This is used in TermFactory for associating properties to labels (Designations) and to labeling relations (Terms). See TF Schema profiles.

ont:Sue meta:subjectOf [ rdf:type Owns ; meta:hasObject ont:Fido ; meta:context <http://foo> ] .

Nguyen et al. 2014 propose singleton properties. A singleton property is a property with only one subject and object. Its domain and range are both singletons:

:owns#1 a rdf:subPropertyOf :owns ; rdf:domain owl:oneOf (:Sue) ; df:range owl:oneOf (:Fido) ; meta;context <http;//foo> .

A singleton property uniquely identifies one triple, so properties of the triple can be allocated to the property. On the downside, this approach bloats the schema vocabulary. Also since property names unlike blanks are global, there is a risk of unintended identification of triples.

Another variant of reification replaces triple object by a blank node having the original object as rdf:value and associate triple properties to the blank:

ont:Sue ont:owns [ rdf:value Fido ; meta:context <http://foo> ] .

Call this variant of reification value reification and a model which represents quads in this way a contexted model . TF uses contexted models for dataset i/o. Contexted models are not compatible as such with OWL since OWL requires datatype and object properties to be disjoint. A datatype statement cannot be reified by value reification in OWL, because a datatype property cannot have a blank node with properties as object. But OWL inference does not work with datasets anyway.

Contexted blank objects in triples sharing subject, predicate, and the blank object's rdf:value are identified across models. An easy way to identify blanks in merging contexted models is to convert the merge into a dataset and then form a new contexted model out of the dataset.

RDFS

RDF Schema (RDFS) is a meta vocabulary for defining RDF vocabularies. It allows typing resources into classes and subclasses, and describing RDF properties.

An RDF property is a URI (named resource) that represents a binary relation viewed object-orientedly, as a property of members of its domain, taking values from a range. RDF properties are many-valued. RDF properties can be partially ordered with respect to generality, using the second order property rdfs:subPropertyOf . RDFS semantics says that properties rdfs:subClassOf and rdfs:subPropertyOf are transitive. A RDF query for a property should also return triples labeled with its subproperties.

OWL

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies, and is endorsed by the World Wide Web Consortium. OWL is considered one of the fundamental technologies underpinning the Semantic Web.

The Web Ontology Language OWL codes description logic, a subset of classical predicate logic designed to express the kind of statements used by people in defining concepts since Aristotle. OWL can express concepts and relationships like these: (translation to Manchester OWL syntax in grey)

Fido is a dog. :Fido rdf:type :Dog
Dogs are pets. :Dog rdfs:subClassOf :Pet
Pets have masters. :Pet rdfs:subClassOf :hasMaster Some :Human
Humans are not pets. :Human owl:disjointWith :Pet
Only humans have pets. :Pet rdfs:subClassOf :hasMaster Some :Human
Fido has two masters. :Fido rdf:subClassOf hasMaster min 2

The main forte of OWL is that it not only expresses those things (for the human reader), but machines can also understand and reason with them. Description logic engines like Fact++ or Pellet are able to relate such facts to one another automatically and give intelligent, that is, reasoned answers to questions about them, combining facts and drawing consequences that are only implicit in them. That is more than the average relational database does, maybe more than the average person is wont to do.

OWL semantics

Like RDF, OWL has a graph semantics , consisting of nodes connected by binary labeled arcs. But OWL goes past RDF by defining logical constructs with more complex semantics, whose RDF representation consists of several triples. An OWL processor knows to keep such triples together and map between them and alterative OWL syntaxes like OWL functional syntax or Manchester syntax.

OWL properties

OWL properties divide into three disjoint types: object properties, datatype properties, and annotation properties. An object property ranges over resources, a datatype property over literals. An annotation property's value can be anything. In OWL, there is no top property common to all, so an OWL query must always specify which type of property is being sought.

OWL annotations are for information that does not describe the domain itself but talks about the description of the domain. Annotation information is not part of the logical meaning of an ontology, which in practice means that they do not take part in OWL DL reasoning. In OWL 2 DL, one can declare domains, ranges and sub-properties for annotation properties. (See http://bloody-byte.net/rdf/dc_owl2dl/index.html .)

OWL versions and profiles

During TermFactory lifetime, OWL has passed through two versions, OWL 1.0 and 2.0. The OWL 1.0 family of languages is based on two (largely, but not entirely, compatible) semantics: OWL DL and OWL Lite semantics are based on Description Logics , which have some attractive and well-understood computational properties, while OWL Full is intended to provide compatibility with RDF Schema. OWL ontologies are commonly serialized using RDF/ML syntax.

In October 2007, a new W3C working group was started to extend OWL with several new features as proposed in the OWL 1.1 member submission. This new version, called OWL 2, has already found its way into semantic editors such as Protégé and semantic reasoners such as Pellet, FaCT++ or HermiT. For TF, OWL 2 has better metamodeling and annotations.

The new features of OWL 2 include

  1. extra syntactic sugar to make some common statements easier to say, e.g., the disjoint union of classes
  2. new constructs that increase the expressivity for properties, e.g., qualified cardinality restrictions or property chain inclusion, database style keys
  3. extended support for datatypes, e.g., data type restrictions and facets for restricting a datatype to a subset of its values
  4. simple metamodeling capabilities to express metalogical information about the entities of an ontology
  5. extended annotations capabilities to annotate entities, ontologies and also axioms
  6. other major innovations: declarations, new language profiles (sublanguages).

OWL 2.0 comes in three profiles, OWL 2 EL, OWL 2 QL and OWL 2 RL.

  • OWL 2 EL
    • Captures expressive power used by many large-scale ontologies, e.g.; SNOMED CT, the NCI thesaurus;
    • Features include existential restrictions, intersection, subClass, equivalentClass, class disjointness, range and domain, object property inclusion (SubObjectPropertyOf), possibly involving property chains, and data property inclusion (SubDataPropertyOf)transitive properties, keys (hasKey) …;
    • Missing features include value restrictions, cardinality restrictions (min, max and exact), disjunction and negation.
  • OWL 2 QL
    • Captures expressive power of simple ontologies like thesauri, and (most of) expressive power of ER/UML schemas;
    • Features include limited form of existential restrictions, subClass, equivalentClass, disjointness, range and domain, symmetric properties, …;
    • Missing features include existential quantification to a class (ObjectSomeValuesFrom), self restriction (ObjectHasSelf), nominals (ObjectHasValue)(ObjectOneOf),universal quantification to a class (ObjectAllValuesFrom), ObjectMinCardinality, ObjectExactCardinality), disjunction (ObjectUnionOf, DisjointUnion) etc. cf. the Profile document for an exhaustive list missing features.
    • Can be implemented on top of standard relational database.
  • OWL 2 RL
    • Includes support for most OWL 2 features;
    • But with restrictions placed on the syntax, for example it does not include existential on the right hand side of axioms (which often occurs in Life Sciences ontologies, e.g., SNOMED). Standard semantics only apply when they are used in a restricted way;
    • Can be implemented on top of rule extended DBMS e.g., SQL (see Implementation Perspective).
  • OWL 2 EL is the maximal language for which reasoning, including query answering, is known to be worst-case polynomial.
  • OWL 2 QL is the maximal language for which reasoning, including query answering, is known to be worst case logspace (same as DB).
  • OWL 2 RL allows for polynomial reasoning (consistency, classification, and instance checking) using rule-based technologies.

There is an online OWL 2 validator and an OWL syntax converter .

Instances, classes, properties and roles

As a species of description logic, OWL is a decidable subset of first order predicate logic with an object-oriented syntax. As any first order logic, it makes a sortal distinction between instances, or "things out there" and classes of such things. OWL goes past simple classification since it can talk about two-place relations. A binary relation is viewed object-orientedly (or should one say egocentrically) as a property of its subject, having the object as value. Nevertheless, such properties are by default multivalued, and they can have inverses, like parent and child. Most characteristically, OWL deals with roles. Roles are classes one belongs to in virtue of having a property (that is, a relation to something). Most of our apparent inherent properties are at depth roles in some bigger configuration. Something is big only in relation to something small, or a master only as the master of a slave. OWL cannot say very complicated things, but it is good at defining roles. And roles are a good approximation to the classical Aristotelian way of forming terminological definitions: "A species is that subclass of a genus which has the differentiae that...". To give the same point a linguistic turn: OWL has a good match with the natural language syntax of relational nouns and relative clauses traditionally used in terminological definitions.

The TermFactory term ontology schema

The TermFactory term ontology schema TFS.owl defines the skeleton of a multilingual terminological ontology in OWL, conformant with current terminology standards and other language technology standards , rich and precise enough to support semantic inference and language technology applications.

One of the liberating insights from describing terminology in a logic rather than by a fixed schema - whether hierarchical (tree structured) as XML or as tabular (record structured) like a relational database - is that there is nothing sacrosanct about the TFS schema vocabulary. Not only has TermFactory got a fixed entry format, there is no unique fixed signature (set of reserved words) either. As long as there are ontologies and reasoning, there does not have to be a fixed meta vocabulary for terms. A seed vocabulary like TFS.owl and its companions is there only to help get started and provide a fixed point for conversion by reasoning. Beyond that, everyone is free to use their own vocabulary and conceptualization, provided an ontology is provided for it, plus bridge rules to map it to others, and eventually to TFS.owl in particular.

Like classical logic, OWL has an open world assumption built in. There is no way in first order logic to quantify over properties, hence to enumerate all and only the properties that a class allows to its members. As a consequence, an OWL schema can only name individual properties that must or must not be present. In this respect an RDF or OWL schema differs from a database or XML schema. which work with a closed world assumption (whatever is not specifically allowed is prohibited).

Like classical logic, OWL has no unique name assumption built in. A resource need not have a name at all. A concept may or may not have a global identifier to hang descriptions on, but as long as it is a valid URI, it matters little what that identifier is. A resource may have many of aliases. Two names must denote the same object only if they are asserted or can be inferred to do so.

In fact, OWL cannot even express completeness: these and no other properties are allowed for members of this class. It is possible to say of some given class and property that they are incompatible. One can say for instance

ex:Class owl:DisjointWith [ rdf:type owl:Restriction ; owl:onProperty ex:property ; owl:someValuesFrom owl:Thing ]

or the other way round,

ex:property rdfs:range [ owl:complementOf ex:Class ] .

Unlike XML, there is no formal distinction in OWL between an ontology and an ontology schema. One just draws a line somewhere between statements that must hold in all TermFactory ontologies and the rest. The common core is called the TF ontology schema (TFS.owl). Together with its extensions (TFStrict.owl, TFTop.owl, TFSem.owl) and profiles (LegacyStrict.owl, DictionaryStrict.owl) it forms the top ontology in TF.

Like classical logic, OWL is weakly typed. It is possible, but not obligatory, to specify individual type, superclass of class, and domain and range of object properties. The default is the vacuous type owl:Thing . This weakness can be a strength. With strict typing, it can be necessary to create dummy objects just to satisfy typing requirements. In TF, it is rarely necessary. Therefore TF can accommodate a variety of approaches to terminological description. (See section on term properties below.)

The OWL standard in itself is about semantics, or concepts in abstraction from the (human) language in which they may be expressed. Beyond that, the standard has little to say about language or multilinguality.

The RDF standard provides the attribute rdfs:label , with values in language-tagged Unicode strings, to specify alternative human readable labels for classes. This format is perhaps the most common one in RDF data. It corresponds to the simplest TFS profile TF Labele .

Over and above rdfs:label , TF supports an explicit OWL ontology vocabulary for human language expressions and terms that allows full ontological control of the natural language expressions associated to concepts. TFS does not not presume to replace lightweight ontology localization provided by rdfs:label attributes. For many purposes, it is quite sufficient. But TF provides a (relatively) painless migration from it to increasingly fuller ontologies of designations and terms (and back).

rdfs:label properties can be converted to TF terms in many ways using TermFactory utilities and queries. Query pellet4tf query -F -q sparql/construct-designations-for-rdfs-labels.sparql school.n3 > grid.ttl constructs a TF Form ontology out of a collection of ontology concepts with rdfs:label properties. Query pellet4tf query -F -W reblank -q sparql/construct-designatedby-terms.sparql deblank-home:/io/grid.ttl constructs a TF Sign ontology out of the TF Form ontology. New terms and expressions can finally be given descriptive names with a relabeling query. Query query -N -1 -F -q sparql/construct-tfs-full-for-rdfs-labels.sparql lionfish.ttl prefix.ttl > fullfish.ttl accomplishes conversion from rdfs:label localization labels to TF Sign terms and designations with descriptive names.

TF terms

A TF designation, an instance of TF class exp:Designation , is a reification of a rdfs:label literal: a resource having minimally the key properties string, language code, and a morphological category tag.

A TF term, an instance of TF class term:Term , is an association (extensionally equivalent to an ordered pair) of language independent concept with a natural language designation. This accords with terminology standard DIN 2342-1 and de Saussure's definition of a linguistic sign .

This definition makes terms symmetric between concept an designation. In ordinary usage, sign words like term are relational nouns or roles for expressions. Terms are said to have properties of their designations: they have length, are composed of words, and have part of speech and other grammatical features. Another standard definition of term actually goes that a term is an expression that denotes a given special language concept. (Note: TF singles out designations as the subclass of language expressions that name concepts.)

But terms also have properties related to the referent, like abstract, concrete, mass, chemical. Third, they have properties that may relate to the term only as a sign for just this concept, like rare, obsolete, preferred. In a multidomain terminology collection, an expression can designate more than one concept, and have conflicting properties in its different roles. In that case, one cannot identify the expression with the term/s as such. One way out is to reify the term, and that is what TF does.

Consider the following example of the two strings home and hallitus . English home has a general language meaning an abode, and a special domain meaning "institution" (as in "we must put granny into a home"). The same string means mildew in Finnish. The Estonian word for mildew is hallitus . The same string in Finnish means government. Each such pairing of a meaning (blue) with a form (yellow) is a term/sign (green) in TF.

Show/hide TF terms

TF terms

The core TF schema TFS.owl only defines the skeleton of the TermFactory special language term ontology. In its intended model, a term is a pair of an designation and referent. The designation is a natural language expression. The referent can be a genereral concept (class pun) or a an individual concept (named individual: a person, building, country etc.). TFS.owl does not entail all the properties of this intended model, not even all those expressible in OWL. More properties of the intended model can be imported from schema extensions as they are needed. In particular, they may be useful for integrity testing, to make sure that a term ontology reflects the intended model. The point is that not all of the properties are always needed, and the more of them are asserted, the more work there is for reasoning. In particular, axioms which entail separation of classes and distinctness, existence or uniqueness of instances are expensive to reason with, because increase the size of the model by they multiplying the number of entailments and blank nodes. Such complicating axiom types involve RDFS vocabulary like rdfs:domain , rdfs:range , rdfs:subPropertyOf and OWL vocabulary like owl:sameAs , owl:differentFrom , owl:disjointWith , owl:Restriction , owl:inverseOf , owl:FunctionalProperty and more.

TF top ontology

What the very top of an ontology looks like is largely philosophical, little hinges on that choice in practice. But TF has a top ontology, and a philosophy to go with it.

Following Peirce, a term is a linguistic sign, a special case of a sign. There are other signs: general language words, formulae, images, traffic signs. Following de Saussure, signs are pairings of a Form, as a signifier, with a Meaning, the signifiant. Specifically, a term pairs a designation with a referent. The referent is a concept (for general terms) or a named individual (for appellations).

An old terminology standard, DIN 2342 (Begriffe der Terminologielehre) defines a term as „zusammengehörige Paar aus einem Begriff und seiner Benennung als Element einer Terminologie.“ This definition conforms to de Saussure's definition of linguistic sign.

But TF needs not fix any one notion of a terminology entry as the one and only correct record structure. (Being a logic, it can define and relate many such notions.) TF builds on the semantic network metaphor, dual to the container metaphor of hierarchical databases (and more recently, XML). The container metaphor comes from physical media like paper or magnetic tape. Containment among convex objects naturally forms tree structures. In a rooted directed tree, it makes sense of talk of nodes as bigger elements containing smaller elements. In an undirected tree or graph, all nodes are equal, any node can be taken as root or focus. Nodes do not contain one another, rather, they are visualised as dots connected by links. RDF or OWL graphs are not rooted. The serialisation of a semantic network in RDF/XML need not respect connectedness. Information concerning a given node may be distributed freely among disconnected descriptions in an RDF/XML document.

As a descendant of classical logic, ontologies are weakly typed. It is possible, but not obligatory, to specify individual type, superclass of class, and domain and range of object properties. The default is the vacuous type owl:Thing . TF tries to take advantage of this. Different typing regimes support different semantics (resoning) and different use cases of TF. With strict typing, it can be necessary to create dummy objects just to satisfy typing requirements. With weak typing, it is not necessary.

An application of weak typing is that TF descriptions can be forms, meanings or signs. This allows the whole gamut of attaching uninterpreted labels, language specific messages, and language independent content as descriptions of other such entities.

Weak typing allows treating concepts, terms, and expressions intensionally, as roles of objects of a kind. An expression is a term without its semantic properties, and a concept is one without expression properties. No sortal distinction made between things out there and their names. All instances can be signs. Individuals get identified with individual concepts. Jesus Christ can belong simultaneously to classes Human and Noun. In traditional (Western) terminology theory, these classes would be disjoint. By Aristotle's (and Quine's) definition of ambiguity, this makes resources ambiguous. Ambiguity is not harmful as long as no reasoning is done on it that hinges on it. One can translate without resolving an ambiguity when the translation retains the same ambiguity. This theme is pursued further in the section on TermFactory schema profiles .

Also keep in mind is that an OWL ontology is not just a taxonomy in the classical Aristotelian sense of a tree of genera and species. It is a semantic network, a collection of nodes and arcs representing objects and binary relations between them. As a special case of that structure, a set of distinguished binary relations including rdf:type , rdfs:subclassOf , owl:sameAs , owl:differentFrom and owl:disjointWith gives the graph a monadic first order class structure, like Boolean algebra or Venn diagrams. That structure allows many competing taxonomies as trees spanning the graph, none the more or less true than the others. Any given taxonomy can be thought of as a tree-formed index to the Boolean class structure, serving the purposes of efficient insertion and retrieval of instances.

A two-dimensional tree representation of an OWL taxonomy misleadingly suggests that the subclasses branching out from a class as its subclasses must be disjoint. In general they aren't. Logically, the rdf:subClassOf relation is a partial order, not a tree. A class can have many supers in the same "tree" (directed acyclic graph). Thus an OWL ontology tree can have as sisters classes that at best belong to parallel cross-classifying taxononomies. This happens when there are orthogonal bases of classification, like classification by structure (parts) and by function (roles). In standard terminography, there is a special notation for such many-dimensional cross-classifying taxonomies (see e.g. ISO FDIS 704 example 8 ). Unfortunately OWL editors do not support it.

TF signatures

There are many instances of such simultaneous cross-cuts at the top of the TF schema. The triad Sign-Form-Meaning is cross-cut by the class Description . TF descriptions can belong to any of the other three classes. Second, Meaning is subdivided to object-like (extensional, existent) Concepts and proposition-like (intensional, truth valued) Contents on the one hand, and to count sem:Count and noncount sem:NonCount meanings on the other hand. These two dichotomies cross-classify to form a fourfield. Samples: invidual, material, proof, evidence. The following tabulation shows the central terminological notions and their semiotic counterparts. A difference between the concept/term/designation triad and the more general triad meaning/use/expression is that the former pertains to context-independent concepts, conventionalised terms and dictionary forms, while the latter includes context-dependent meanings, more ad hoc conventions, and inflected forms. For instance, labels for concepts in the TF user interface are less than terms, but still parts of a special language.

Signature meaning link sign link form duality type
sign sem:Meaning sign:hasSign sign:Sign sign:hasForm syn:Form either
sem:Notion sign:Symbol syn:Label object (exists/not)
sem:Thought sign:Message syn:Formulation proposition (true/false)
term ont:Reference term:hasUse term:Use term:hasExpression exp:Expression either
ont:Referent term:hasTerm term:Term term:hasDesignation exp:Designation object (exists/not)
ont:Content term:hasDescription term:Description term:hasText exp:Text proposition (true/false)
dictionary sem:Synset sign:synsetOf sign:Sense sign:senseOf syn:Lemma

The triad ont:Reference - term:Use - exp:Expression covers all uses of special language to signify something, whether lexicalised or occasional, nominal or textual. The dictionary signature is directed from form to meaning according to tradition.

Category theory for terms

An Aristotelian insight is that the distinction between meaning and form is not absolute: not a classification but a relation. Matter (meaning) and form are correlative in the way domain and codomain of a morphism are relative to the morphism. In normal TF usage, an expression is a general language vocabulary item whose meaning is narrowed or transferred when it is used as a special language term. The common language expression, and common language in general, is form relative to special-language terminology. The special language expression borrows linguistic properties from the conventional meaning of the general language expression. But that general language expression can be further analysed as a linguistic sign with form and meaning. This notion accords well with the spirit of universal algebra ( category theory ): morphisms are closed under composition. Terms and expressions are also (instances of) concepts. TF does not say terms and expressions are disjoint classes. An instance which is at once a term and an expression can have properties of both classes.

TF, as an ontology of terms, takes its inspiration from the category theoretic abstraction of morphism, a structure preserving mapping from one domain to another. This is taken as the germ of the idea of one thing meaning or representing another. A morphism is the reification of a meaning relation as an object. A sign is a special case of such a reification, as a pairing between some form and some meaning, or as the role played by a form at the receiving end (codomain) of such a mapping.

The underlying motive of the form-meaning morphism is the core category theoretic notion of limit (universal element). A limit in a category is an object through which all the remaining objects can be factored. It takes out the slack from the category, leaving its common core. In translation, a language-independent (interlingual) meaning, shared by a set of synonyms, is a limit (universal element) through which an equivalence class of synonyms is factored. In graph terms, the reified meaning constitutes the hub of a spanning subtree of form-meaning relations that removes the redundancy of a square matrix of bilateral synonymy relations. Recursively applied, the reification of synonymy into meaning allows logarithmic savings in the size of the representation (from m times n to m plus n). Dually, the meaning-form morphism allows analogous savings in the representation of homonymy. (In the original Aristotelian sense: two things are homonymous if they are called by the same name). Properties of the shared name need only be mentioned once. Monolingual sector terminologies have little need for this split because of the ideal of monosemy per special language: an expression should have just one sense per subject field. The situation is different across sectors and (sub)languages: one and the same expression (say operation ) is used in many fields.

Show/hide limit diagram

Form and meaning as limits

An abstract way of looking at concepts, terms, and expressions emerges from Aristotle's insight. A term results from the reification of a labeling relation, obtained by repeatedly splitting the relation into the composition of two others and interpolating an intervening resource. rdfs:label equals term:referentOf o term:hasDesignation o exp:baseForm.

Also interestingly, the two extremes, concepts and expressions, appear dual. A concept is a "set" of expressions, and an expression is a "set" of concepts. This is not set theoretically sane as such, but makes category theoretic sense. Assuming both concepts and expressions form boolean algebras (closed under join, meet, and complement) and there is a bijection between atoms of one and coatoms with other ('basic' concepts have 'basic' designations, as they do in any ontology), it is possible to find for any concept an uniquely corresponding expression, and for an expression a uniquely corresponding concept. It is in the sense of this bijection that concepts "are" expressions. Concepts, as limits of similar real things on the other hand, and limits of synonymous expressions on the other hand, share ontological character with terms and expressions.

Categorially taken, then, concepts "are" expressions, just dually related: a concept "is" a synset, an expression "is" a homset (the set of all the concepts that it names). Except we should not talk about sets. Since we are metamodeling classes and set theoretic relationships already, we can use the metamodeling instances and properties to build a purely algebraic model. The dual map between concepts and expressions is term:designates. E designates C, where since both are Boolean algebras we can always choose E and C so that designates is one-one.

A naive term list enumerates terms one by one, listing all properties associated directly or indirectly to each in one go, without distinction: properties related to its referent(s), lexical properties, and properties related to the term as such. This shallow end of the term pool is what we call TF Label . At the deep end, the commonalities between terms that share meaning is reified to a separate entity, the shared meaning, and the shared semantic properties associated to that, and the meaning associated to the term. Symmetrically, the commonalities between terms that share lexical and grammatical properties are collected to a new entity, the shared expression. Intermediate cases are obtained by making or not making the term/concept or term/expression splits individually. The aim is non-redundancy: every property of a resource must provide a fact about the resource, the whole resource, and nothing but the resource, "so help me Codd".

Nothing is absolute here: terminology is an applied science. What seems one and the same concept in one analysis can get refined into many later. By parity, the category of expressions allow refinement: what counts as an expression for one purpose can become a term with a meaning under further analysis.

Concept is a category theoretic limit of equivalent signs. A concept has specialisations, including individual concepts. The extensional dual of a concept is a class which includes classes and contains things. The content of a message is its gist, a nonredundant variant of it, without the slack. Its extensional dual is a proposition, set of possible worlds, that contains individual extensional contexts (models, valuations).

Orientation, literally, has to do with the orientation of the rdf graph around ("about") a node in the graph. A given layout is determined by the choice of root and the order of traversing the edges of the graph to obtain a spanning subtree that constitutes a tree formed hierarchical representation of it, and, eventually, the choice of serialisation of the tree into a one dimensional stream of characters.

Generic classification

TF provides a representation of the classical Aristotelian type of classification of a domain into a Porphyrian tree of genus and species by differentia used in traditional Wüster style terminology theory.

In a generic classification tree, the values (species) of a dimension (differentia) form a partition of the superclass (genus). The species are disjoint with each other and union of the species is the genus. For instance, a red object (member of species 'red object') is a colored object (genus). It has color (differentia) red, and red is a color (dimension). The differentiating property 'has color' is a subproperty of 'has type', and the dimension 'color' is a second order class of the different colors (red, blue, green, ...) . All the colors are disjoint and together exhaust the class of (colored) objects.

A partition of a type into subtypes is accomplished by a collection of subproperties of rdf:type with range in a dimension, a (second order) class of disjoint classes, for example color or grammatical number. Property ont:hasColor ont:Red equals property rdf:type ont:Red .

ont:Color rdfs:subClassOf owl:Class . ont:Red rdf:type ont:Color . ont:hasColor rdfs:subPropertyOf rdf:type ; rdfs:range exp:Color . ont:China ont:hasColor ont:Red . ont:China rdf:type ont:Red . exp:Singular rdf:type exp:Number . exp:hasNumber rdfs:subPropertyOf rdf:type ; rdfs:range exp:Number . exp:en-China-N exp:hasNumber exp:Singular . exp:en-China-N rdf:type exp:Singular .

Differentia and dimensions as second order objects are as such not definable in OWL DL. But they can be metamodeled in OWL 2. In effect, color names are made systematically ambiguous between class and named individual.

The dimension Color is metamodeled by

:Color owl:oneOf ( :Red ... :Green ) . [ a owl:AllDifferent ; owl:members ( :Red ... :Green ). ]

The first order class Colored can be defined by enumeration as disjoint union of the various colors. In OWL 1, a partition is written like this:

:Colored a owl:Class; owl:unionOf ( :Red ... :Green ) . :Red owl:disjointWith :Blue, Yellow, ..., Brown, :Green . :Blue owl:disjointWith :Yellow, ..., Brown, :Green . ... :Brown owl:disjointWith :Green .

With OWL 2, this can be abbreviated to

:Colored a owl:Class; owl:unionOf ( :Red ... :Green ) . [] a owl:AllDisjointClasses ; owl:members ( :Red ... :Green ).

or just to

:Colored a owl:Class; owl:disjointUnionOf ( :Red ... :Green ) .

Differentia are metamodeled as

:hasColor a owl:FunctionalProperty; rdfs:domain :ColoredObject; rdfs:range :Color .

The two senses of 'red' can be related in OWL by

:Red owl:equivalentClass [ a owl:Restriction ; owl:onProperty owl:hasColor; owl:hasValue :Red ]

OWL cannot express the generalization of the above rule to any class and subproperty of meta:hasType. The TF Pellet OWL reasoner has been extended to draw this inference. For instance, if an object has color red, then the object is red.

Show/hide color classification

Classification by color

Differentia are used to define grammatical codes in TF in TFCodes.owl . For example the datatype property exp:number "sg" is equivalent to the differentia exp:hasNumber exp:Singular which in turn is just syntactic sugar for rdf:type exp:Singular.

TF ontology extensions

The following table and graph depict the TF schema ontology with its extensions. The clusters and colors represent conceptual and functional groupings. There are no built in import relationships between the extensions, to allow free mixing and matching of the extensions in different combinations.

Show/hide TF schema extensions

TFS extensions
TFS extensions
schema gloss
TFS.owl contains just enough for day-to-day business with validated TF terms.
TFTop.owl extends TFS.owl for general language vocabulary, in particular,
TFwn.owl bridges Princeton Wordnet to TFTop.owl
TFSem.owl extends TFS.owl for NL semantics, in particular, semantic role frames.
TFProp.owl contains a taxonomy of TF properties for querying and conversion.
tf-TFS.owl and its language specific extensions ??-TFS.owl localize the TF Schema vocabulary.
TFLang.owl contains ISO language codes in TF format.
TFCat.owl contains TF part of speech and inflection codes in TF format.
TFCtry.owl contains ISO country codes in TF format.
TFSField.owl contains the TF subject field classification.
TFStrict contains integrity constraints for validating a TF term ontology.
DictionaryStrict separates terms from expressions.
LegacyStrict separates terms from concepts.

The WordNet to TF bridge TFwn.owl places the WordNet class Synset as a subclass of TF class Meaning. Wordnet senses become instances of Sign (with property term:approximate "true" as default). WordNet synsets differ from TF Sign style concepts by having an inherent part of speech, which in TF Sign is a form property. Also, WordNet senses have label, which in TF Sign is a form property. This makes WordNet an instance of Legacy profile with some traces of Lite: Synsets are instances of Meaning but they have part of speech; Senses are instances of Sign but they have base forms. There is an additional WordNet class Word that contains words (Forms). See section on morphology .

TF Schema profiles

Different TF Schema profiles answer differently the question which terminological notions are reified as resources on their own right, i.e. become first order entities having properties, and which ones can remain virtual (literals, classes, properties and roles). An answer to this question is also implicit in the various terminology theoretical definitions of what a term really "is". There are two related reasons to reify some notion into a resource: it must have properties, and positing it makes the model smaller through sharing. This last is the question ofdatabase normalisation, transposed to the ontology instrument.

The (now outdated) DIN 2342 (Begriffe der Terminologielehre) defines a term as „zusammengehörige Paar aus einem Begriff und seiner Benennung als Element einer Terminologie.“ In contrast, the ISO 1087-1:2000 definition of term reads:

verbal designation (3.4.1) of a general concept (3.2.3) in a specific subject field (3.1.2)

Note:

A term may contain symbols and can have variants, e.g. different forms of spelling.

One (perhaps not the intended) reading of this definition is that a term is a verbal expression, considered in the role of a designation of a general concept. (Another ISO formulation, 'Designation of a defined concept in a special language by a linguistic expression' [ISO 1087:1990] is vague between two construals.) A literal ontological reading of the latter definition would be

term:Term rdfs:subClassOf exp:Expression , [ rdf:type owl:Restriction ; owl:onProperty term:designationOf ; owl:allValuesForm ont:Concept ] .

In this construal, term:Term is a role of exp:Designation in the technical sense of OWL. It is that subclass of exp:Designation whose members designate special language general concepts. This reading is a variety of the TF Legacy profile. On this reading, terms/expressions shared between domains are not just similar, they are the same. This is because here term:Term is not reified into an individual. An expression may have many names corresponding to its roles as designation for different concepts, but the names denote the same entity. The terms cannot go with have conflicting term properties, or conflicting term and expression properties (say, owner), since they denote the same thing.

Different profiles are useful for different descriptive purposes. Given the protean structure of the RDF triple soup, it is not necessary to bundle all information into one big procrustean construction to tag along for all purposes. For example, here are three important uses for TF. These three applications can be aligned with the aims of information science, lexicography, and terminology, respectively.

  1. Association of concepts with natural language labels (ontology)
  2. Grammatical description of natural language expressions (lexicography)
  3. Disambiguation of natural language terms used for concepts (terminology)

For each purpose, appropriate TF schema profiles are defined, called TF Label, TF Dictionary, and TF Term, respectively.

A special field ontology typically does not go further than labeling its concepts with natural language strings, leaving it to users to disambiguate and process the labels. String labels are also sufficient for traditional interface localization tasks.

It is most efficient to separate special language concept analysis, done by special field experts, from the grammatical description of (common language) words and phrases, done by linguists and language technology, The proper association of natural language designations with special language concepts is the domain of terminologists. By allowing separation and merging of each type of content, TermFactory should allow each sector to do their work independently as well as in concert, having access to the others' expertise but also keeping out of each other's way.

We distinguish two limiting cases of TF schema profile called TF Label and TF Sign. TF Label is the base case, other profiles are implemented as additional axiom sets on the TF schema. TF Label adds no axioms.

Comparing TF Label semantics to TF Sign semantics, the main difference is whether meaning is treated as a first order (syntactic) relation between terms or a second order (semantic) relation between classes. Technically, the distinction is between using application specific instance level properties like sign:hyponymOf or OWL properties like owl:subClassOf . The former support dealing with taxonomies of named entities. OWL class semantics provides the power of description logic reasoning: Boolean and some relational reasoning about classes defined by description not just by name.

The TFS profiles cross-cut TF signatures. Thus TF Sign has a terminological analogue TF Term where the term signature ont:Concept, term:designatedBy, exp:Designation etc. replaces their respective generalizationssem:Meaning, sign:signifiedBy, syn:Form, etc., and a lexicographical instance TF Lemma with the signature sem:Synset, sign:signifiedBy, syn:Lemma.

To support the different semantics, tools will be provided to rewrite an ontology from one level to another. Among other things, this simplifies conversion to TF. Conversion libraries only need to support direct conversion to the closest subset of TF. TF specific SPARQL conversion scripts automate upgrading/downgrading between levels by automating the necessary splits/merges. TF Label may be an easier target to convert to from contrastive terminologies of the lexicographical type. On the other hand, TF Sign is more transparent to OWL reasoning. See section on TFS profile conversion .

TF Label

The RDF standard provides the attribute rdfs:label , with values in language-tagged Unicode strings, to specify alternative human readable labels for classes. This format is perhaps the most common one in RDF data. It corresponds to the simplest TFS profile TF Label . TF_Label is unproblematic when forms and meanings match one to one and need no further properties, so that there is no call to reify them separately.

TF Form

The TF Form profile abstracts one step further from TF_Label in that it reifies the language tagged string datatype value of rdfs:label as a form (designation) separate from the meaning (concept), relating the pair with an object property. The triple

:foo rdfs:label "label@lang" .

becomes the graph

:foo sign:signifiedBy [ a syn:Form ; syn:lemma "label" ; syn:lang "lang" rdfs:comment "this is a form" . ] .

The TF Form profile does not separate terms and designations (signs and forms). A multilingual word list consisting of comma separated values like Finland,fi,Suomi" is simple to rewrite to valid TF Form:

<http://tfs.cc/ont/Finland> term:designatedBy [ exp:langCode "fi"; exp:baseForm "Suomi" ] .

An advantage of using vocabulary consisting of term:designates and rdfs:label is that it avoids ontological questions. A rdfs:label with a language-coded string makes the least possible ontological commitment. We avoid awkward questions of expression identity and ontology mapping. Assume triples like the one above are added to a model. If one is just interested what labels a concept might have, a query for rdfs:label will include these. If the reasoner understands composition, it would also return TF baseforms that could be composed into labels. What if the query is for base forms of expressions? For the reasoner to answer a baseform query with a rdfs:label , it would need to do decomposition, that is apply property chain axioms in reverse to unpack rdfs:Label and infer the existence of a (blank) expression having as baseform that label. Existential instantiation is not what ontology reasoners typically do, so chances for real time queries are slim. But this reasoning could be implemented through some offline processing. The main thing is that the semantics is right. (In the long run, it is counterproductive to avoid ontological questions about expressions. Without ontological commitments, we cannot keep track on our vocabularies: manage sources, authorship, versioning etc.)

With composition (property chains), we can make exact sense of the Aristotelian intuition that form-matter relationship is a relative one, forms a scale. To express that a special language concept is designated by an expression that in turn has a general language meaning, we go over duals like this:

C term:designatedBy E term:designates M

where C meta:subClassOf M . Adding decomposition into terms and signs, this becomes

C term:referentOf T sign:hasForm E sign:formOf S sign:hasMeaning M

where T is a special language term and S is a lexicographic sense of the same expression E.

A mapping of TF Form into TF Label , the minimalist language coded rdfs:label localization, could be obtained if we could define the composition of object property designatedBy with datatype property exp:baseForm and subsume it as subproperty of rdfs:label , hiding the language code as a datatype string attribute in the string value:

<http://tfs.cc/exp/Finnish> rdf:label "suomi"@fi .

OWL 2 does not cover composition of object and datatype properties . But TF Label can be derived from TF Form using SWRL rules from TFRules.owl.

<swrl:Imp rdf:about="/TFLabelRule"> <swrl:head rdf:parseType="Collection"> <swrl:datavaluePropertyAtom> <swrl:propertyPredicate rdf:resource="&rdfs;label" /> <swrl:argument1 rdf:resource="#x" /> <swrl:argument2 rdf:resource="#z" /> </swrl:datavaluePropertyAtom> </swrl:head> <swrl:body rdf:parseType="Collection"> <swrl:IndividualPropertyAtom> <swrl:propertyPredicate rdf:resource="&term;designatedBy" /> <swrl:argument1 rdf:resource="#x" /> <swrl:argument2 rdf:resource="#y" /> </swrl:IndividualPropertyAtom> <swrl:datavaluePropertyAtom> <swrl:propertyPredicate rdf:resource="&exp;baseForm" /> <swrl:argument1 rdf:resource="#y" /> <swrl:argument2 rdf:resource="#z" /> </swrl:datavaluePropertyAtom> </swrl:body> </swrl:Imp>

TF Form conversion using composition is an example of OWL turning what first looks like a syntatic conversion into a matter of description logic reasoning. Instead of running a syntactic conversion script, we reason with a bridge ontology.

As a syntactic profile conversion, designations can be constructed out of rdfs:labels using SPARQL 1.1. functions. Query pellet4tf query -q home:/etc/scripts/construct-designations-for-rdfs-labels.sparql school.n3 > home:/io/grid.ttl constructs a TF Form ontology out of a collection of third party ontology concepts with rdfs:label properties. (Cf. section on SPARQL endpoints .) The converse reduction of TFS Compact designations to rdfs:label properties is provided by script home:/etc/scripts/construct-rdfs-labels-for-designations.sparql .

The converse decomposition of TF Form profile ontology with blanks into a Term profile one involves existential instantiation in a way that goes beyond SPARQL or description logic. TF can accomplish it by combining blank node rewriting with SPARQL queries. The following query first retrieves the TF Form ontology grid.ttl with a query alias (deblank-) that skolemizes it, i.e. replaces blank nodes with URNs, then runs a CONSTRUCT query that merges the missing middle terms into the model (construct-designatedby-terms) and finally restores the blanks (-W reblank).

pellet4tf query -F -W reblank -q sparql/construct-designatedby-terms.sparql deblank-home:/io/grid.ttl

The following query does the same but names the blanks with TF descriptive names.

pellet4tf query -v -F -W relabel -S home:/owl/TFS.owl -q sparql/construct-designatedby-terms.sparql deblank-home:/io/grid.ttl

Compare also section on descriptive names .

TF Sign

The TF Sign profile reifies instances of the property sign:signifiedBy between instances of sem:Meaning and syn:Form. A typical TF Term profile entry shows the whole triad of concept, term, and designation.

[ a term:Term ; term:hasDesignation [ a exp:Designation ; exp:baseForm "petollinen ystävä" ; exp:catCode "N" ; exp:langCode "fi" ; meta:source "avain3" ; ] ; term:hasReferent term:seeFalseFriend ; meta:source "avain1" ] .

Given the reification, it is possible to characterize the designation and its assignment as term to the concept separately. (Here, they have different sources.)

In terms of RDF reification, class sign:Sign can be mapped to that subclass of rdf:Statement whose predicate is sign:signifies. Using OWL2 property chains, we can explicitly define the property connecting form and meaning sign:signifies as the composition of the relations sign:formOf and sing:hasMeaning connecting signs to form and meaning, respectively. Inversely, signifiedBy can be defined as the composition of sign:meaningOf and sign:hasForm. Analogously for term:designates and its inverse term:designatedBy.

TF Legacy

Traditional (monolingual) terminology and lexicography profiles separate concepts from terms (Legacy) or words from senses (Dictionary). In Legacy terminology work, expressions need not be separated from terms, since in normative monolingual terminology expressions are monosemic. In descriptive Dictionary work, where word senses are assumed to be unique (no full synonyms), there is no need for language-independent concepts. TF Sign separates all three. The three-way separation is motivated in multilingual cross-domain term collections. Both Legacy and Dictionary profiles are restrictions of the Form profile where the many-many relation of signification between meaning and form is reduced into a one-many relation (Legacy: terms are monosemous) vs. a many-one relation (Dictionary: meanings are unique).

The separation of terms from expressions is fine tuning on the expression side. In Legacy, a term is a role: a term IS an expression that denotes a given concept. The same expression can have many roles as different terms, but there is just the one expression playing them all. In this profile, one expression cannot represent two terms with conflicting properties. To allow that, terms must be reified separately.

Lhe legacy profile is a reduct of the Sign profile obtained by identifying terms with designations. This notion can be formalised as follows.

_:c term:hasTerm _:t . t term:hasDesignation _:d . _:d exp:form "foo" . _:c term:hasTerm _:t owl:sameAs _:d exp:form "foo" . _:c term:hasTerm _:t exp:form "foo" . _:t a term:Term , exp:Designation . term:hasDesignation owl:equivalentProperty owl:sameAs . term:Term owl:equivalentClass owl:intersectionOf (exp:Designation [ a owl:Restriction; owl:onProperty term:termOf; owl:someValuesFrom ont:Concept ] ) .

In Legacy, terms sharing designation are not quotiented into designations. Nothing is lost as long as designations are specific to concept. Shared grammatical properties get stated redundantly.

c 1xm t nx1 d ==> c 1xm t 1x1 d ==> c 1xm t (one concept can have many terms, each term has one concept).

TF Dictionary

A term and its referent (individual, class) on the other hand, are on different sides of the Tarskian semantic divide: the former is a piece of language, the latter pieces of the world. A concept, that can be seen as a representative of an equivalence class of objects as well as one of terms, falls somewhere in between. By Stone representation theorem there is a bijection between the intensional view (a Boolean algebra of concepts, with individual concepts as atoms) and the extensional one (first order model theory with individuals and their classes).

To avoid category error, the TF Dictionary profile distinguishes meta level entailment properties between terms to match object level denotational relations between classes. E.g. sign:hyponymOf, sign:synonymWith, sign:meronymOf are term instance properties matching rdfs:subClassOf, owl:equivalentClass, ont:partOf in TFS concept (class/instance) vocabulary. Which vocabulary should be used between WordNet synsets depends on whether synsets are construed syntactically as (representatives of collections of) signs or semantically as instances of class Meaning. (We choose the latter.) The relation of two terms that share the referent is sign.synonymWith . It is also the relation holding between a term and and a terminological definition of the same concept. This property lets us associate a definition text to a term sharing language with it, while maintaining with terminology theory that a definition defines the concept, not the term.

Lhe Dictionary profile is a reduct of the Sign profile obtained by identifying meanings with senses. This notion can be formalised as follows.

:y sign:synsetOf _:s sign:hasLemma _:l syn:lemma "foo" _:y owl:sameAs_:s sign:hasLemma _:l exp:form "foo" . _:s sign:hasLemma _:l exp:form "foo" . _:s a sem:Synset , sign:Sense . sign:hasSense owl:equivalentProperty owl:sameAs . sign:Sense owl:equivalentClass owl:intersectionOf (exp:Synset [ a owl:Restriction; owl:onProperty sign:synsetOf; owl:someValuesFrom syn:Lemma ] ) .

In Dictionary, senses sharing meaning are not quotiented into synsets. Nothing is lost as long as meanings are specific to concept. Shared meaning properties get stated redundantly.

y 1xm s nx1 l ==> y 1x1 s nx1 l ==> s nx1 l (one lemma can have many senses, each sense has one lemma)

TF Contrastive

Descriptive translation-oriented contrastive terminography does not start from, or aim at, interlingual concept harmonisation, but operates with pairwise term comparisons. In a TF Label implementation of this approach, terms start out as undivided entities. At the start of the game, we collect a cluster of similar terms like fi-korpi-N_-_ont-Forest , en-woodland-N_-_ont-Forest , and contrast their differences. In TF, such pairwise comparisons between terms may reified into contrastive entries of type term:Contrast relating the terms of the comparison.

For instance, comparing English woodland to Finnish korpi , there is an overlap but not a complete equivalence. A woodland is a sparse wide forest, while Finnish korpi is a wild wide forest. This observation may be reified into an instance of term:Contrast with English term as subject and Finnish term as object, and marked as an approximate match, specifically, it has the property term:approximate with value ~ for overlap. A sample of this approach is shown below.

Show/hide TF term contrasts

Such contrastive comparisons can also point the way for concept analysis. In TF Term profile, where terms and concepts are separated, the contrast may be described by splitting concepts with contrastive features (sparse, wild) and relating the refined concepts using ontology primitives (subClassOf, complementOf, disjointWith).

TF schema profile checking

If concepts, terms, and designations have inconsistent properties, they must be separate entities. Then we may say that the TFS profile is strict. A TF ontology can be tested for strict conformance to the higher TFS profiles using separation schemas LegacyStrict.owl and DictionaryStrict.owl . LegacyStrict checks that signs are separate from their meanings. DictionaryStrict asserts that signs are separate from their forms. If both are satisfied, we have strict TF Sign. An intermediate position is that term:Term is a proper subclass of exp:Expression. In that case, terms are expressions, but not (necessarily) just roles: an expression can be used by many conflicting terms.

The following Pellet queries check whether signs, forms, and meanings can be separated in ContrastiveLabel.ttl:

pellet consistency LegacyStrict.owl ContrastiveLabel.ttl pellet entail -e LegacyStrict.owl ContrastiveLabel.ttl

However, as was pointed out in the category theory section, the points of the TF triad are relative, not absolute. While it makes sense to make semiotic relations like term:hasReferent irreflexive, so as to separate object and metalanguage levels, categorical separation axioms between the members of the semiotic triad are not strictly true. A special language term can be designated by a general one, like the English mathematical term for plus or minus is designated by the common English noun (sense of the English word) sign . Or a sign can be the referent of another sign; for example, a quote refers to the text without the quotes. Given no unique name assumption, a TF ontology can choose whether URIs for terms, concept instances, and expressions point to different entities. They can be identified using owl:sameAs and separated using owl:differentFrom .

In general, the strict axioms cannot be enforced at all times. Rather, they serve to pinpoint potential problem spots during development. For instance, one cannot define Designation strictly as 'a designating expression', because during terminology work, not all designations have referents, nor as 'expression having a base form', because not all designations for concepts are lemma forms.

The following figure illustrates the relations between the different TF schema profiles. At the bottom is shown the RDF default, known as TF Label profile, where concepts are provided with language tagged rdfs:label strings. It is isomorphic with TF Light shown on top, where expressions and terms are not separated from (other) concepts and all properties issue from the same resource.

Show/hide TF schema profiles

TF schema profiles

TF namespace conventions

Terminologists know that languages for special purposes are hardly separable from general language. In fact, special languages are more justly considered subsets of language at large. This is indicated in the TFS profile diagram by inclusion relations between concepts in the terminology namespaces ont, exp, term and those in the corresponding semiotic namespaces sem, syn, sign . Innovating somewhat, we can say that the former namespaces are sub-namespaces of the former. Since there is no such notion in the w3c namespace standard, it needs the support of TermFactory specific conventions. The conventions are as follows.

Whenever a class, property or instance in the sub-namespace has the same local name as one in the super namespace, there must be an appropriate relation of inclusion between the namesakes. Thus for instance,

  • ont:Meaning is a subclass of sem:Meaning
  • term:hasDescription is a subproperty of sign:hasDescription

Special language concepts may thus narrow down general language meanings, for instance, TermFactory class ont:CountryCode for the ISO country codes is a narrowing of the more general concept sem:CountryCode that includes telephone codes and other similar conventions.

Inclusion between instances reduces to the symmetric relation of identity, or owl:sameAs . Extensional identity notwithstanding, there are modal and pragmatic differences between aliases for the same resource. There is Frege's example of Venus as Morning Star and Evening Star. Christian holy trinity of father, son, and spirit is another case; nicknames, family names and role names for persons a third one. Different assertions are conventionally associated to different labels in different contexts, possibly by different communities. Those communities may differ about the identity of the resources, while for an outsider it may be convenient to treat them the same. All this happens in the Semantic Web, and therefore it is advised to leave room for intentional synonymy.

For instance, special language expressions are often just general language phrases playing a special role. The special language expression exp:en-limit-N for category theoretic limit is to all intents and purposes the same lexeme as the general language word syn:en-limit-N . By the TF namespace inclusion convention, they are indeed the same resource, i.e. triple exp:en-limit-N owl:sameAs syn:en-limit-N can be asserted. The namespace inclusion convention thus implies that special and general language expressions are not distinguished (as far as the standard namespaces of TF go). There is no such confusion between special and general language terms: a special language term term:en-limit-N_-_ont-Limit and general language word sense sign:en-limit-N_-_sem-Limit have different local names, just as the meanings are different (though nested).

TFS profile conversion

Consider what the freedom from ontological commitment in TF means from the point of view of data complexity and terminology evolution. Say we get a simple list of names. We can first enter them as instances of class Term. Sometime later the names get associated with some content. As long as there is no ambiguity, we may associate the content directly to the term instances. There is no need to split terms from expressions yet because there is nothing to share. When a need arises to disambiguate the expression, we may split the shared expression off the different terms. Dually, when a need arises to share content, we can split the shared meaning into a concept and relate synonyms to that. All this can be done just in time, on demand, not ex ante just because the data structure requires it.

Say we find at some point that we need to split a vague concept into two. For instance, the English word parse can be a noun or a verb, denoting the process or parsing or its result. For other languages like Finnish, the two concepts need to be separated, because they go with different words. We can keep the vague concept that holds the shared information and create new concepts/expressions that inherit shared features from the vague concept, and add the distinguishing features to new subconcepts. If the process/result ambiguity is pervasive, it can be coded as an axiom, rule and/or template.

Show/hide concept split

parse

The TF schema profiles TF Label, TF Sign, and their intermediates serve among other things to help conversion of content to TF. The idea is that a given third party terminology collection can be converted to the TFS profile that is closest to the source structure. Afterwards, when appropriate, the content can be further converted inside TF to conform to another profile.

Experience with various conversions (echa and kyamk glossaries from excel tables and puls from lisp source) showed that conversion from ad hoc formats directly to TF is often easier than conversion through TBX, thanks to the non-hierarchical nature of RDF and OWL, that makes it able to flexibly merge piecemeal partial information about resources obtained from different places in the source.

An experimental MS Excel workbook macro has been written which converts a simple excel table of term information into RDF triples conforming to TF Label. The triples are further stepwise converted into more complex TFS profiles using TF internal ontology tools.

TF Label does not separate concepts and expressions from terms. It is the easiest target to convert to from contrastive terminologies of the lexicographical type.

TF Sign separates concepts, terms, and expressions. It is most useful with multiple-domain term collections (like TF itself).

To support the different semantics, tools are provided to rewrite an ontology from one level to another. Conversion libraries only need to support direct conversion to the closest subset of TF. TF specific SPARQL conversion scripts automate upgrading/downgrading between levels by automating the necessary splits/merges. A subtask is under way to provide a library of conversion scripts between the different TFS profiles and associated validation and matching tools.

TermFactory Schema FAQ

The rest of this section is organised as a set of (not very) frequently asked questions about modeling classical terminology theory in TF.

Q: Has every term got a designation and a referent?
A: In TF Term profile, that is the design. Designation and Referent are named classes in TF Term. whose ranges are object properties term:hasDesignation and term:hasReferent, respectively. They are defined as key term properties in TFKeys.owl:
<owl:Class rdf:about="&term;Term"> <owl:hasKey rdf:parseType="Collection"> <rdf:Description rdf:about="&term;hasDesignation" /> <rdf:Description rdf:about="&term;hasReferent" /> </owl:hasKey> </owl:Class>
Q: Is the designation of a term always an expression?
A: Yes. The range restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasDesignation"> <rdfs:range rdf:resource="&exp;Expression"/> </owl:ObjectProperty>
Q: Is the referent of a term always a concept?
A: No. It can be an individual too. In TF, appellations are terms. The range restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasReferent"> <rdfs:range rdf:resource="&ont;Referent"/> </owl:ObjectProperty>
Q: Do only terms have designations?
A: Yes. The domain restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasDesignation"> <rdfs:domain rdf:resource="&term;Term"/> <rdfs:range rdf:resource="&exp;Designation"/> </owl:ObjectProperty>
Q: Do only terms have referents?
A: Yes. The domain restriction is asserted in TFS.owl:
<owl:ObjectProperty rdf:about="&term;hasReferent"> <rdfs:domain rdf:resource="&term;Term"/> </owl:ObjectProperty>
Q: Does a term have a unique designation?
A: Yes. Functional property assertion in TFStrict.owl:
<owl:ObjectProperty rdf:about="&term;hasDesignation"> <rdf:type rdf:resource="&owl;FunctionalProperty"/> </owl:ObjectProperty>
Q: Does a term have a unique referent?
A: Yes. Functional property assertion in TFStrict.owl:
<owl:ObjectProperty rdf:about="&term;hasReferent"> <rdf:type rdf:resource="&owl;FunctionalProperty"/> </owl:ObjectProperty>
Q: Can two different terms have the same designation and referent?
A: No. A term is uniquely identified by its designation and referent. Uniqueness assertion in TFStrict.owl:
<owl:Class rdf:about="&term;Term"> <owl:hasKey rdf:parseType="Collection"> <rdf:Description rdf:about="&term;hasDesignation" /> <rdf:Description rdf:about="&term;hasReferent" /> </owl:hasKey>
Q: Are the term and its designation different things?
A: Asserted in DictionaryStrict.owl:
<rdf:Description rdf:about="&term;hasDesignation"> <rdf:type rdf:resource="&owl;IrreflexiveProperty"/> </rdf:Description>
Q: Are the term and its referent different things?
A: Asserted in LegacyStrict.owl:
<rdf:Description rdf:about="&term;hasReferent"> <rdf:type rdf:resource="&owl;IrreflexiveProperty"/> </rdf:Description>
Q: Are terms and expressions disjoint classes?
A: Disjointness asserted in DictionaryStrict.owl:
<owl:Class rdf:about="&term;Term"> <owl:disjointWith rdf:resource="&exp;Expression"/> </owl:Class>
Q: Are terms and concepts disjoint?
A: Disjointness asserted in LegacyStrict.owl:
<owl:Class rdf:about="&term;Term"> <owl:disjointWith rdf:resource="&ont;Concept"/> </owl:Class>
Q: Are designations and referents disjoint?
A: Not asserted either way. Counterexample: a quoted expression is an expression which designates an expression.
Q: Are all expressions designations of some term?
A: Not asserted either way.
Q: Are all concepts referents of some term or other?
A: Not asserted either way.
Q: Can a concept be referent of more than one term?
A: Not asserted either way.

TermFactory schema properties

As a logical language, OWL supports redundant property and class names. This redundancy is useful: it allows expressing the same information in many ways, so as to choose the most convenient one for each purpose. The redundancy works like typing in computer languages to help catch category errors. Multiple property names for one relation with names selecting the participants is a familiar expedient in natural languages as well.

Such redundancy is used generously in TF in aid of brevity to avoid explicit type triples. For instance, following mean the same thanks to subproperty and range declarations in TFS:

hasDescription [a Example ; ...] hasExample [ ... ]

One place where redundancy is on the way is in updates. The more there is redundancy, the more places there are where the update must be registered. For this purpose, it is good to have tools to remove redundancy from an ontology before it is edited or updated. (This point is elaborated in the sections on editing.)

The main classification tools in an OWL ontology are class and property hierarchies. A good TF sub-ontology bridges its top concepts (at least indirectly) to the TF schema and checks predefined classes and properties for fit before subclassing or inventing its own. It is good ontology writing practice to use classes ( rdf:type exp:English ) instead of distinguishing properties (like exp:langCode "en" ) for classification when workable, because description logic reasoners and editors support classes best. (The TF schema actually defines the class exp:English as the class of those things whose langCode is "en".) It is also good practice to build subproperty trees instead of a long flat list of properties. Subproperty relations allow alternative views on related properties and support querying properties by class instead of by name. This is particularly important with repository specific new types of properties. A query engine is able to return meaningful subsets on the basis of subproperty classifications without having to know the individual properties. This is valuable when the ontology collection is composed of contributions from independent sources.

TF divides properties first of all into properties of terms, expressions, and concepts. A group apart are semantic properties in the sem namespace, used in the TF natural language interface. To make the classification of properties as concept, term, or expression properties explicit, we build explicit hierarchies of TF object and data properties in TFS.owl. Here is a representative sample:

meta:objectProperty meta:conceptProperty meta:termProperty meta:expressionProperty

This property hierarchy is kept in a separate ontology document TFProp.owl . In it, each property also has a paronymous instance (pun) in namespace meta0. The puns are used in localizing property names. (Paronyms for properties are needed in OWL 2, because even OWL 2 does not allow punning property names.) The puns are classified into a property instance classification that runs parallel to the property hierarchy:

meta:ObjectProperty meta:ConceptProperty meta:TermProperty meta:ExpressionProperty

The property classification simplifies the statement of property ranges, since the range of each type of property can be given just once for the type. Other properties inherit it from the type(s) they belong to.

Term properties

In the TermFactory terminology model, a term is a special case of a two-faced de Saussurean sign that links a meaning or referent (signifié) with a form or expression (signifiant). In practical terminology work, the distinction between term and expression is not always made explicitly. In TF, it is an option motivated by multitopicality and multilinguality. Among other things, it allows separation of those properties that expressions have by general-language grammar from those properties that are associated to them only in a specific domain-dependent or special-language meaning.

The TF expression-term-concept triad instantiates de Saussure's notion of dyadic sign ( Wikipedia s.v. sign ). It not identical to the triangle of reference or semiotic triangle of Peirce, Frege, and Ogden/Richards. In the semiotic triangle, the corners are named symbol (sign), thought (reference), and object (referent). Here, expression and sign are not distinguished. Instead, there is a split on the meaning side between intension and extension. This distinction goes back to Platon and Aristotle. The corners of the semiotic triangle are word (symbol/sign), mental affection (thought/intension), and thing out there (referent/extension). Aristotle's words-as-symbols or signs match TF expressions-as-terms (these two are not separated). The semiotic triangle thus corresponds to TF Legacy profile. Mental affections (intensions) are best matched with TF classes and concepts. Things out there (extensions) have a reflex in TF instances, which denote such things.

In terminology literature a similar figure is known as the terminology triangle. Its corners are concept, term and object.

With respect to the TF triad, terminological properties tend to fall into three groups: those which hold of a concept independent of its expression (concept properties); those which hold of the expression (expression properties), and those which hold of the combination, or of one side relative to the other (term properties). However, TF does not take too strict an approach to the tripartition. Sometimes an expression property is better associated to a term (if the expression is too loose). Sometimes a meaning property is more correctly associated to a term (if the concept is too loose).

TF tries to avoid the necessity to postulate dummy elements just to satisfy type requirements. A simple glossary may not warrant creating even dummy concepts to the terms it describes. Descriptions can in TF be associated to concepts, terms or expressions. There is no necessity for a TF ontology to contain a single term at all to be well formed. There is no notion of "minimum entry" in TF. Any set of triples conformant to the TF schema can constitute a TF document.

By default, the denotation relation term:hasTerm or term:hasReferent between a concept and a term is exact. When it is not, exactness can always be restored by positing intervening concepts which match their terms exactly and whose extensions bear various subclass relations to one another (definable by owl:subClassOf ). This solution has the advantage that it subjects the semantic relations to OWL reasoning. As a shorthand, the TF schema also allows expressing degree-of-equivalence with property term:approximate with literal values false|hypo|hyper|true . These shorthands have no reasoner support without extra axioms or conversion to OWL. (Compare also TF Label .

Two terms are approximately equal if they have nonempty meet and join, i.e. they are non-disjoint co-hyponyms. This much is implied by term:approximate value true . The rest of the values are interpreted analogously: hypo says that the term is a hyponym, i.e. its exact denotation is included in the given referent, and hyper says that the term is a hypernym, i.e. its exact denotation is wider than the given referent. The default value is false meaning that the match is exact (not approximate). A related TBX feature directionality connects semantically closest neighbors between approximate terms around some concept. This facility is not supported in TF as such.

The degree of equivalence can be approximate when an expression is associated to a concept of another locale (culture) by way of translation. However, the property of being a translation is not another degree of equivalence. In TF, a term can be recognized to be a translation from a disparity between language code and country code . If the language and country codes associated to the term (either on the term itself or on one of its two sides) do not agree, the term is not vernacular, i.e. it is a translation.

Term property term:hasTransferComment and the description type term:TransferComment pertain to the TF Label use case. Many of its uses can be recoded in TF Sign in ways that are more transparent to automatic reasoning.

The terminology standard property termType takes as values such types of terms as phrase, full form, short form, variant, abbreviation, acronym. These become subclasses of class Term. Phrase subclasses Expression because phrasehood is largely parsable from the expression independent of its meaning. As in linguistics, TF countenances one word phrases. The terminological notion of phrase denotes phrases which are not words. Perhaps we want another notion for a terminological phrase (one that has two or more words as compositional parts).

As for the remaining values, the assumption is that an expression in a given meaning can be recognized on its form to be of one or another of these types, as defined in the standard. (For instance, a variant means an alternative spelling or allomorph of another form). These can be considered term properties on the presumption that it is either computable or uninteresting just which other form(s) each form is related as , say, abbreviation for, or variant of. If so, they become classes like exp:Abbreviation or properties like term:shortFormFor depending on the case. (Class term:ShortForm can be defined in OWL from property term:ShortFormOf .)

A few TBX term types, like antonym of, false friend with, homograph with, should perhaps go unanalysed in TF schema, if one wants to reason with them. Antonymy is a second order property related to the OWL notion of inverseOf, false friends and homograph are relations between terms with similar expressions but different referents. But an imported vocabulary can of course keep its own notion of antonym.

Property term:register holds a picklist of usage register values. Register tells about the situation where (to what sort of addressee, in what audience) the expression can be used. Possible values might include high, low, formal, scientific, poetic, colloquial, technical, slang, familiar, honorific, vulgar. For values, see http://www.isocat.org#interface/index.html .

Property term:usage takes picklist values describing how generally (by what part of the population) a term is used. Values might include rare, common, dialectal, obsolete. There is a slippery slope here toward register (some labels like learned might describe usage or register).

Property term:attitude lists picklist values for what an expression tells about the attitude of the author to the referent. Values might include meliorative, pejorative, augmentative, diminutive, euphemistic, hypocoristic. These three data properties are not functional.

Descriptions

Descriptions, in terminology theory, are usually textual descriptions of concepts of terms. In TF, a description is any object that describes another object. TF schema leaves it open what type of TF object a description is. It makes room for descriptions which are not textual. In terms of the TF ontology, a description can be anything: a form, a meaning, or a sign. We want to keep TF terminologies flexible about multilinguality and semantic explicitness. For this reason, TF descriptions can be signs or meanings, and they can be associated to any TF objects. This tolerance allows associating langauge-independent descriptions to meanings and language-dependent descriptions to signs. The former tactics is the natural one when one plans to machine generate language-dependent descriptions from a common interlingua. The latter tactics allows semantically unanalysed localized descriptions which have no equivalents in other languages.

The latter approach offers itself if the subject field has not been subject, or is not amenable, to concept analysis. This happens in new fields and areas where meanings are sacrosanct or a bone of contention (like religion or politics). This is where the whole distinction between concept and term is somewhat artificial, with little basis or even interest in distinguishing the concept from the term. (See TF schema profiles .)

Since TF is weakly typed, we can leave types of items open and reify them later. A language specific explanation can start out as just a exp:Text node with a string associated to a specific term in that language. TF does not try to enforce any one discipline here. The character of the link can be inferred from the types of the entities at the two ends.. If the explanation gets translated to more languages, it starts to make sense to create a ont:Content node for the shared content to keep the translations together. If the explanation link is kept as it was, other translations for the explanation are accessible over its sign:hasContent link. But if a multilingualised explanation is associated to a language independent concept, it makes more sense to create an explanation link between the contents.

TF has not got built into it the TBX notion of a language set as a container node keeping together items in a given language related to a concept as a primitive. But nothing prevents constructing such collector nodes, as an intermediary step to TBX conversion, using SPARQL for instance.

Two well known subclasses are definitions and explanations. Explanations are descriptions which explain the use of a concept but do not have to separate it from other concepts. In TF, term:Explanation is a subclass of term:Description .

Definitions

There are two traditions for what definitions are or should be like. The Aristotelian-terminological tradition places a concept in a taxonomical system of concepts in terms of a superordinate concept (genus) and distinguishing features (differentiae). A definition in this tradition is the definiens, a phrase of the same syntactic category as the definiendum that can be substituted for the definiendum salva veritate . In the logical tradition, a definition is a theory that fixes the meaning of a term in some model (or class of models). Such a definition can be explicit (an equivalence whose sides are the definiens and definiendum) or implicit (a sentence or set of sentences in which the definiendum occurs).

TermFactory countenances both types of definitions. An explicit concept definition of the terminological type is a textual sign of type term:Definition that bears relation term:definitionOf (a subproperty of ont:referenceOf co-hyponymous with term:referentOf ) to the concept it defines.

A logical definition, in an explicit sententical equivalence form or an implicit or contextual definition, is a description of type sign:Definition that bears the relation sign:definitionOf to) another similar object. It does not entail that the sign and its definition are intersubstitutable as such.

Finally, there is a simple data property meta:definition that just associates a string to something else as its definition. This type is provided for TF Label , and expected to go away in conversion from Label to Term.

RDF has property rdfs:isDefinedBy used to point from a resource to a document that (among other things) contains a definition (or at least context) for it. It is more like a bibliographical referernce than a definition link.

OWL has its own definitional primitives owl:sameAs and owl:equivalentClass . For a TF definition to be correct, the definiens and its definiendum must be logically equivalent in OWL terms, i.e. the relation owl:equivalentClass (or owl:sameAs in the case of individuals) holds between the referents of the definiens and the definiendum. The definiens and the definiendum can have (in terms of sign:hasMeaning ) the same TF meaning but they need not. The meanings can also be different resources (have different URIs), as long as they are OWL equivalent. In general, a concept can have many extensionally equivalent, but intensionally different definitions, all equally correct. (For discussion see Overloading OWL sameAs.)

Circular definitions are not incorrect, but they may be uninformative. If we want to use reasoning to prevent circular definitions in TFS, term:hasDefinition must not be an equivalence relation, but a strict partial order (irreflexive, transitive, asymmetric). It is an intensional refinement (a spanning tree subset) of the equivalence relations owl:sameAs/owl:equivalentClass .

TF (v 1.5) embeds class term:Description in a more generic class meta:Description . In terminology theory, it is a requirement that descriptions (definitions, examples etc.) have source indications. term:Description requires source. General language dictionaries, on the other hand, don't usually give sources. The more generic class meta:Description does not require sources.

Concept properties

At the top level, the TF triad divides signs into form and meaning. A related distinction that cross-cuts meaning is the duality between referential, object-like entities (which exist or not) and informational, proposition-like entitities (which are true or false). On the object side, a TF ontology contains terms that relate designations and concepts. On the proposition side, a TF ontology contains messages with text and content . In each case, there is some comḿon kernel which gets reified into a language-independent concept on the other hand, or language independent content on the other hand. The "moving parts", the language specific frills, get associated to this kernel as so many expressions for it.

The TF class ont:Concept is the class of first-order representatives, or puns, of OWL classes. ont:Content stands for language independent meanings with a propositional semantics (usually, the interlingua used is English, but it can also be some other language, formal or natural.) exp:Text stands for its concrete expression in some language. term:Description is a role for (mostly propositional) representations of concepts, including definitions, explanations and the like.

It is a small innovation to extend the TF meaning-sign-form triad to propositional content. The extension allows associating multilingual definitions etc. to a concept in one go, leaving it to the system to sort out language-specific versions. (TBX tries to do something similar with its language set grouping, but the connection between matching language versions of the same definition are not explicit in it.)

A third central logical distinction is the distinction between individuals (tokens) and classes (types). This distinction is represented in owl DL with with the first order logic class owl:Individual , second order metaclass owl:Class and property rdfs:type that expresses class membership between an individual and a class. In terminology theory, the distinction between individual and class corresponds to the distinction between (general) concepts, designated by (general) terms, and individual (concept)s, whose designations are sometimes known as nomenclature or appellations (ISO 1087-1:2000:2,6).

Product or brand names (such as CocaCola), like many other type-token ambigous terms, are undecided visavis the individual/class distinction. As names of individual companies, they act as proper names. As names of product (line)s, they behave as classes with instances. The individual/class distinction is not absolute: what should count as an individual and what a class is a practical question that depends on what kind of reasoning support is required. It the ontology is to support stock accounting, it may be better to treat CocaCola as a class and cans as individuals. If the ontology is for tracking company news, CocaCola can be treated as an individual.

Stock keeping units and part numbers be accommodated as well. TF is not a tool for real time stock accounting, but it can be useful to attach a TF ontology to such a system for semantic search or localization purposes.

Traditional terminology theory divides concept relations into three types:

  1. generic (intensional, concept) relations. These are expressed in TF with relations rdf:type and rdfs:subClassOf plus OWL primitives like owl:DisjointWith .
  2. part-whole relations. These are in TF represented with the relations sem:hasPart and ont:hasPart and their inverses sem:partOf and ont:partOf .
  3. the rest, lumped together as functional relations. These are represented in TF by whatever seems fit in each case.

ont:partOf stands for the extensional part/whole relation between instances, and sem:partOf the corresponding relation between concepts, i.e. the language-neutral concept-level equivalent of meronymy between terms. The relationship between ont:partOf and sem:partOf is not definable in terms of extensional logic, for the latter has modal import. For instance, a given bicycle chain is by definition and design a bicycle part, whether or not it ever becomes part of any given bicycle. It is enough for it to be built for the purpose. That is, the intensional triple ont:BicycleChain sem:partOf ont:Bicycle is equivalent to the triple ont:BicycleChain rdfs:subClassOf ont:partOf Some ont:Bicycle only in the best of possible worlds.

In ordinary language, the relation between a whole and its parts is generic: the parts are (stereo)typical of the whole. Bikes have wheels, without going into detail which do and how many. The minimum (that some bikes have some wheels) is less than intended, while the ideal (all bikes have one to three (?) wheels may be too much. Lexicographical practice leaves the relation implicit to allow for variation. An ontology of a particular bike model may detail the parts list extensionally.

Subject field classification

A further important concept property in TF is the property meta:hasSubjectField . It relates meanings to a thesaurus, or taxonomy of concepts naming fields of human inquiry. Bibliographical thesauri form a looser thematic classification between concepts using broader/narrower term relations as defined in bibliographical thesaurus construction (see the Simple Knowledge Organization system (SKOS) . Thematic relations can be construed as inclusion relation between document collections: a concept is narrower than another if documents about the former are included in documents about the latter. Consequently, ont:hasSubjectField is not identical to skos:broader in TFTop.owl. TF ont:hasSubjectField is the relation between a terminological concept and a bibliographical subject field. The range of ont:hasSubjectField is different from its domain, so in general, the relation is not transitive. A TF concept is thematically narrower than another such concept if the range of the meta:hasSubjectField property for the former is included in the range of that property for the latter. The SKOS concept broader (more properly, broader or equal) relation includes, but is not exhausted by, rdfs:subClassOf relations. For example, partOf relations may entail thematic inclusion as well (car parts belong to the broader domain of cars). Concrete subject field names are normalized into the plural, while terminological classes are normalized into singular. And there are other, functional relations between subject fields as well.

Frequency is represented as a picklist property meta:frequency taking real number values and a picklist valued term property term:usage with traditional values like common|rare .

Geographical usage is indicated by the datatype property ont:ctryCode with values in the ISO standard of country codes. The corresponding class valued property is ont:hasCtry .

Connotations

The TermFactory ontology schema contains property sem:hasConnotation for connotations, that is, non-denotational semantic associations, other entities that a given entity "brings to mind". Expressions, terms, and concepts all have connotations. Unlike denotations, connotations tend to spread by association. They may or may not be definable in terms of more specific relationships (i.e. there may or may not be a cause or good reason for a connotation). Connotations can be associated to any ont:Object .

Often, connotations are good or bad, i.e. they associate a value judgment to something. A value judgment is a subjective preference by a value subject. A preference is a binary comparative relation. To express that death has negative connotation (in common opinion), one may write ont:Death sem:hasConnotation sem:Bad . In this case, we can probably say more: death does not just connote something bad, it actually is a bad thing (in common opinion), i.e. ont:Death rdfs:subClassOf sem:Bad .

Expression properties

TF distinguishes expressions from terms. Expressions are language-specific strings with grammatical identity and properties and (hence) some implicit general-language base meaning. Nothing prevents describing general-language meanings explicitly in TF, in which case those general-language form-meaning pairings appear as TF signs. For instance, general cross domain vocabulary relevant to information extraction might be stored in TF in just this way. Also it can be useful to be able to relate new or unstable special language meanings to their general language homonyms (e.g. long-term unemployed ). Rich general-language resources on the expression side makes it possible to machine generate from one "terminological lemma/lexeme" a maximum variety of compositionally predictable derived forms without separate listing.

The term-expression distinction offers two ways of representing term elements, or such parts of terms which deserve to be identified in TF. Parts which have grammatical relevance but no terminological meaning appear as expressions. Those parts which have independent terminological meaning become terms of some less prominent grammatical category. There is nothing in TF to prevent adjectives, adverbs, compound parts, roots or affixes from being terms. (But compare below . Lexicalized phrases like handle with care are accommodated in the sister subclass sign:Message of sign:Sign . Other fixed special language textual units like standard formulas or texts might be included here as well. Note that TF does not make a distinction in kind between "main" and "accessory" units like terms vs. definitions. They are all equally objects of description in TF.

An obvious property for expressions (but not exclusively for them) is exp:langCode with values in the ISO standard of language codes.

Another central property is datatype property exp:catCode with string values abbreviating parts of speech. These part of speech codes are associated with corresponding expression classes which imply the same code. No confusion here: though the ontology class exp:Adjective implies catCode A on its instances, the English expression is en-adjective-N is not one of them and has catCode N. The correspoding class valued property is exp:hasCat .

A third crucial property of expressions is exp:baseForm which spells out the lemma or base form of an expression. A base form can be any valid XML, i.e. properly nested mixture of Unicode strings and XML tags. The corresponding object property is exp:hasBaseForm .

Other properties

Some properties are too generic to fit into any of the above categories. The meta namespace houses properties which contain TF metadata.

Terminology theory insists on documentation of sources. TF provides a class meta:Source (say, a book or a person), with properties like exp:url to identify the source and meta:locus to identify a place in it (say, a URL fragment). Sources are associated to terminology objects with object property meta:hasSource . A simple datatype property meta:source takes a URL as value. Any TF object can get sources. Some convention may be needed about default propagation of source indications.

Database subsetting should not call for special machinery in TF over and above namespaces and user definable OWL classes/properties. Boolean property hidden is an example of ad hoc repository management.

rdfs:comment is the annotation property to use for ad hoc metaleval comments. No particular constraints apply to what such annotations may contain. owl:versionInfo is used for version management. There should be only one version info element in each TF ontology version. (Old version info items can be saved as some other annotation such as a comment.)

There may be a need for symmetric cross-references between objects of various types. The TF top pointer property is meta:see . The name is an exception to the rule that object valued properties should be defined as hasSomething. This is because see properties are symmetric (self inverse). The subclasses can have names like term:seeFalseFriend or seeHomograph .

The current TF schema is only a beginning. TF namespace owners are free to invent new properties. It is good to subclass new properties from existing ones, to reduce clutter and help query data by property type. If successful, they may migrate higher up in the pyramid, eventually end up in TF schema.

TF grammar

One of the long term goals of having an explicit machine-processable expression ontology is to make it possible to parse and generate natural language text from and to TF ontologies. With a natural language generator and the TF schema vocabulary, it should become possible to generate definitions and explanations of concepts from an ontology automatically in the different languages covered by the expression ontology. With a natural language parser and the TF schema vocabulary, it should become possible to parse natural language queries to the ontology and to carry out natural language commands from users. Together, these facilities could support a multilingual human-machine dialogue system. TF could become able to explain itself and let modify itself in natural language.

Syntax in TF

TermFactory is a lexicon manager. It is designed for enumerations, not regularities - beyond those that are best recorded using OWL anyway. OWL is best with some types of semantic regularity. Grammar, including syntactic parses, in general should not need to be stored in TF, since regular forms can be parsed and generated by parsers and generators based on much more efficient principles. In a multilingual terminologist's paradise, TF would not need to store predictable compositional phrases at all, since they could be generated from some interlingual representation with a multilingual generator (GF for instance). In practice, canned phrases do get stored, even in GF grammars, because grammar is not always regular. There is no strict borderline between grammar and lexicon.

The general recommendation remains that TF should restrict itself to providing the minimum of syntax that is necessary for a general purpose parser to produce a unique parse for a phrase. Sometimes, nothing is needed, if the phrase is unambigous or uninflected. In practice, what is useful to store is a function of the completeness and sophistication of the grammatical processors used. For some languages and cases, it is good to know the head word of a phrase to be able to inflect the phrase using a morphological processor. Since the needs vary, TF should not try to be over explicit about how to code syntax.

Grammatical Framework (GF)

Grammatical Framework (GF) is an syntax-oriented multilingual parser/generator that uses a formal language as interlingua. At the level of lexicon (at least), it should be possible to convert TF ontology and terminology to GF format so that GF can be used for ontology verbalization and localization.

TF provides a rudimentary TF ontology to GF lexicon converter in the form of a TermFactory model writer that can be chosen as Get utility output format as --format=GF . A GF write of a TF ontology must be provided with the name of the domain lexicon given by switch --lex and optionally the language code of the GF concrete grammar.

An example TF format lexicon file Finnish.ttl looks like this.

Show/hide TF format GF lexicon

Here are example command lines to convert the ontology to a pair of abstract and concrete GF lexicon modules.

tfget -F --format=GF --lex=Foo home:/gf/Finnish.ttl tfget -F --format=GF --lex=Foo --lang=Fin home:/gf/Finnish.ttl

Below are the results. The module is written to a directory under a name conformant to GF module naming conventions when option --out points to a directory. (The directory name should end in a slash.)

Show/hide GF abstract lexicon module

Show/hide GF concrete lexicon module

GF uses English based three-letter language codes (ISO 639-2/B). TF two-letter language codes (ISO 639-1/T) can be mapped to the GF codes using a mapping file specified as conf option TF_GF_MAPPING . The default mapping is at home:/etc/gf/gf-mapping.n3 . The format of the mapping file matches that of a TF alias file, except for different RDF vocabulary. The mappings are applied the same way as location mapping pattern rules. The same mapping file is used to project TF syntactic frame descriptors to GF linearisation terms.

Show/hide TF to GF mappings

The first batch of TF to GF mappings convert between TF and GF language codes. The second batch localize GF linearization patterns as human readable verbal descriptions of syntactic frames.

Morphology in TF

In special language terminology, morphology plays a minor role. Terms as lexical innovations tend to have simple morphology. In terminology collections, expressions associated to concepts are given in some base form largely in abstraction of derivational or inflectional morphology. To the extent morphology is predictable, it should not need to be in the RDF repository. Unpredictable morphological tags and exceptional forms can be stored.

Inflection

Classical linguistics distinguishes between a lexeme as the paradigm (class of forms) of a word and the individual forms. This distinction is reflected in TF as follows.

syn:Label rdfs:subClassOf syn:Form . syn:form rdfs:subPropertyOf rdfs:label, rdfs:domain syn:Form . n
syn:Lemma rdfs:subclassOf syn:Label . syn:lemma rdfs:subPropertyOf syn:form, rdfs:domain syn:Lemma .
exp:Expression rdfs:subClassOf syn:Form .
exp:Designation rdfs:subclassOf exp:Expression, syn:Lemma . exp:baseForm rdfs:subPropertyOf syn:lemma, rdfs:domain exp:Designation .

A lexeme, as a paradigm (class) of forms of one word, is not reified in TF as such, but as in traditional lexicography, represented by a lemma form. A terminological lexeme, or Designation, is an exp:Expression that has an exp:baseForm string. It is related by syn:baseFormOf to other inflected forms (if listed). The inflected form points to the base form with syn:hasBaseForm. A derived form like exp:fi-epäkunnossa-P can be analogously connected to exp:fi-kunnossa-P by some appropriate syntactic property, say syn:hasDerivedForm. A terminological designation (like fi-epäkunnossa-P ) need not be the lemma form of the lexeme it derives from.

Designations are a subclass of Expressions. Expressions have forms, Designations have base forms. More precisely, an expression that has a base form is a designation. (There is nothing as yet saying that base form is unique, or that a Designation could not have other forms besides base form/s.)

Except for semantically or terminologically conditioned special cases, morphology is better not enumerated in TF, but by a morphology processor. The natural approach is to provide a base form and enough tags to it to allow generating the whole paradigm from them using some morphological processor. Forms that are unpredictable or carry unpredictable meanings may be listed in TF.

Number and count are often better associated to concepts than instances. An instance like ont:Java can belong at the same time to ont:Program that is count and to ont:Software which is noncount. TFS defines string valued property exp:number and a class valued property exp:hasNumber .

Classical terminology conventions and domain thesaurus conventions about number differ. Terminology puts concept designations in the singular if the concept allows singular, while thesauri use singular for abstractions and plural for concrete collections. In the interest of reusability, it appears best not to try to impose a strict discipline here. Roughly, referents of a pluarl concept (members of a plural class) are pluralities (collections of one or more individuals), while those of a singular one are singular individuals. TermFactory does not require a systematic difference between the class Shark and the class Sharks (but TF sites or users are free to make one).

A lexicalized inflection like exp:fi-epäkunnossa-P 'broken' has no base form. Its opposite exp:fi-kunnossa-P is an inflected case of base form exp:fi-kunto-N . Should exp:fi-epäkunnossa-P be related to exp:fi-kunto-N , and if so how? One option is that exp:fi-kunnossa-P is simultaneously classified as an exp:Expression and as an inflected syn:Form whose syn:hasBaseForm is exp:fi-kunto-N .

Derivation

Say parse is a term in the category Verb. It has a productive agent noun parser . Are they the same term or two? We need to split this into two questions. Pro primo: Are there two expressions (lexemes) here with different resource identifiers? Pro secundo: do we need to enumerate and store them in the repository? The answer to the first question is probably positive. parser , though regularly derived from weld , is another lexeme, with its own set of forms. It belongs to a different part of speech than lead . The associated concepts are also different. One classifies processes, the other people. Assuming people are disjoint from processes, they cannot be the same class. One resource cannot have conflicting properties, so we have two. In another language, the relation of the corresponding terms need not be as predictable, and we want to be able to tell which translates which.

How to implement this in practice needs thought. Assume there are resources en-parse-V and ont:Parse in the ontology, but no resource en-parser-N or ont:Parser . How does one query for 'the agentive noun for the verb parse '? Will a query for string pattern "parse" include the agentive noun? Will a query for "parser" find it? Can one find an entry for en-parser-n_-_ont:Parser if it is not already stored?

For the first question, we might write a special query handler for a triple like en-parse-n exp:hasAgentiveNoun ?x . The query handler for this predicate would dispatche to a special purpose reasoner, perhaps implemented by a morphological processor. For the string match query the query engine would need to expand derivatives to find matches for unlisted strings. As for the last question, parsing unknown resource names on the fly requires rewrite rules that unpack the unknown resource identifier into a query. (Location mappings might just swing it in this particular case, but that is not what they are good for.)

The [<c>=parse] natural language parser/generator was adapted to serve this purpose in the predecessors of TermFactory, the 4M and Cogks dialogue systems (see graph below). The cparse generator has a Java/Jena converter that converts between linguistic feature structures and RDF graphs. There is a tool that converts TF ontologies into cparse dictionaries. cparse version 71 was completed in May 2009. Test grammars have been developed for multilingual parsing and generation of concept definitions between Finnish, English, and Chinese.

Show/hide CoGKS architecture

CoGKS architecture

TF frames

The purpose of the TF semantics (TFSem.owl) is to provide enough semantic analysis of general language to support a simple interlingua suitable for typical terminological definitions which can be parsed from and generated to multiple natural languages. The longer term aim is to reduce or obviate the need to manually verbalise definitions that are built following standard rules of terminology from a concept system formalised in TF.

For instance, the concept ont:Parser could be defined in a TF ontology with superclass ont:Program and with property sem:hasFunction value ont:Parse (which in turn could be dethe fined with superclass ont:Analyze having property sem:Object value ont:Text . This already constitutes an interlingua from which it is not difficult to machine generate simple phrases like a program to analyze text to verbalize the OWL definition. (The language independent parser/generator [<c>=parse] has been tested on this example.)

Compare also the ontology design patterns website.

Verbs

The TF general language semantics builds on the TMAD model of tense, mood, aspect, and diathesis. In this model, verbs denote event types, whose instances are events. Event types are built up from states using a regular algebra. States connect timeless OWL classes to the event ontology: a class (instance) like ont:Person is related to event type sem:State by bearing role sem:predicateOf to an instance of state sem:Be . An instance of the class carries the role :sem:subjectOf to the state. Reflecting this construction, event aspect splits up in a fourfield of states, processes, changes and cycles. Examples of each are be, breathe, die, blink , respectively. Each event type has roles associated to it. All event types have time and place. States have subject, changes and cycles have source, goal, path, transitives have agent, object and instrument, animates have aims, means, and function.

The English verb be denotes a state. Jesus is savior translates to Jesus sem:subjectOf _:s . _:s rdf:type sem:Be . _:s. hasPredicate ont:Savior and entails Jesus rdf:type ont:Savior .

Basic OWL class assertions have states in ont:ClassState . They are timeless: their event instances _:c satisfy subjectOf _:c subClassOf predicateOf _:c . An example is Fido is a dog . For class states, sem:subjectOf is effectively a first-order pun of the membership relation rdf:type between an instance and class. For most states, only stages can be projected to class triples. For instance, Jesus was a baby in 1 A.D. , represented by _:e hasTime 1_AD . _:e hasSubject Jesus . _:e hasPredicate Baby . only projects (entails) staged timeless class assertions like Jesus_in_1_AD rdf:type Baby . or Jesus rdf:type Baby_in_1_AD .

The TF event algebra allows the whole gamut of descriptive choices from syntactic event types to the dual view of state based temporal logic models. Which modeling we use may depend which is shortest, and we can mix the views. Recall the ideas of temporal model system as diffs to a model, the duality of objects and events, the duality of events and time, etc.

More event types get constructed with event modals such as causation (transitives) and animacy (agentives). Event type BecomeEvent is a change that shares subject. Event type CauseEvent is a connection of two events one the cause of the other. Event type MakeEvent is the cause-become frame. Event type DoEvent is the frame for animate agency.

Nouns

Some nouns are role nouns, and can be represented by properties. For instance brother is semantically a property ont:brotherOf with inverse ont:hasBrother . Absolute occurrences are existentially quantified as in brotherOf Thing (somebody's) brother. A genitive case or possessive verb with such nouns matches the built-in property value, so that brother of Jesus is just brotherOf Jesus and Jesus has a brother Jakob is jesus hasBrother jakob .

Many nouns have role frames like verbs. For instance road , like its hyperonym passage has the same roles as the verbs pass or go , which the nouns reify: source, path and goal. road to India can be represented as r rdf:type ont:Road . r ont.hasGoal ont:India .

Many nouns are derivatives of verbs and inherit semantic frame from them. Event nouns like parsing in parsing is hard , means the same as the infinitive in it is hard to parse . Semantically, both the noun and the infinitive can stand for the class pun ont:Parse or (if a particular event is meant) some particular event instance in the class ont:Parse . An agent noun like leader means one who does the leading, i.e. the sem:doerOf of an event instance in ont:Lead An instrument noun like parser is related by the sem:instrumentOf role to the event type ont:Parse .

Semantic features of nouns like sem:Animate and sem:Human are in TF semantic classes. (OWL/RDF classifications are, after all, rdf:type features.) Such semantic features name categories that commonly get grammaticalised in natural language. For instance, many languages distinguish animate from inanimate and human from non-human vocabulary. For most purposes, sem:Animate could be defined as the domain of the intelligent agent role sem:doerOf and sem:Human as the domain of the social agent role sem:playerOf . Thus the TF category sem:Human does not designate genus Homo but includes organizations and fictional characters. There is no built in semantic feature sem:Abstract in TF schema. Languages rarely grammaticalize abstractness as such (separate from the count/noncount axis). Indeed, it is hard to make an exhaustive split between the concrete and the abstract. Abstraction is a graded and many-dimensional affair rather than a two-way classification. Many concepts have more or less abstract and concrete uses. Abstractness has to do with the type-token distinction, concreteness with space, time and causation. What counts as type and what token varies. TF may leave it to applications to draw a line where needed.

Abstract nouns can be related with metamodeling (punning). For instance, we can say that red is a color with sem:Red rdf:type sem:Color and that color is a property with sem:Color rdf:type sem:Property . This stays first order, because of the puns sem:Red, sem:Color are not classes, but instances. As Plato would have it, adjective beautiful and abstract noun beauty can both designate the same OWL class sem:Beauty , the difference is grammatical. (Compare the common terminology theory position mentioned above that terminology can get by with just nouns.)

Adjectives

Some adjectives code classes like sem:Female representing female Many more represent comparative relations. A comparative relation like "bigger than" can be coded directly as an object property biggerThan in OWL.

An alternative representation in some ways truer to natural language is the choice function representation, where the positive form of the adjective like "big" is taken as basic. But the adjective is not represented as a class, but as a relation to a comparison class: big is short for big for some class or other. A big animal belongs to class bigFor ont:Animal and a small animal to a disjoint class smallFor ont:Animal . The comparative form "bigger than" becomes a special case of the positive form, where the comparison class consists of two objects.

A third popular way to represent comparative relations is as a measure property, for instance a datatype property ont:length with values in length measures. Measure morphisms can be used with nominal scales too: three valued as in gender m|f|n or two-valued female true|false for classes like sem:Female . The choice or representation depends on the use. Nothing stops using more than one alternative coding. (For background theory see e.g. Krantz et al. 1972). TFS.owl defines a string valued property exp:gender and a class valued property exp:hasGender . The following three forms are equilvalent under TFStrict.owl:

:x gender "n" . :x a exp:Neuter . :x x exp:hasGender exp:Neuter .

Some adjectives are more than two-place, so some reification is needed. A preference relation is a many-place relation between a preference subject, a respect of comparison, and at least two objects to compare. OWL directly supports only two-place relations, so we need to curry the arguments somehow. We solve this as follows. We use two disjoint properties ont:goodIn and sem:badIn relating an object to a ranking to represent comparison with choice functions. The comparison class or choice set is an instance r of class ont:Ranking . Property sem:hasRanking maps (animate?) subjects to Rankings that they hold. These constructs allow representing the four-way relation 'subject s prefers x to y in respect r' in binary relations. For instance, the preference "four legs good, two legs bad" is represented by triples r rdf:type ont:Ranking, four_legs ont:goodIn r, two_legs sem:badIn r , and the value judgment assigned to the animal farm f as f sem:hasRanking r .

The subjectivity of the preference is coded by the ranking r, which plays the role of a context in possible worlds semantics. Nobody else but the farm need have just this ranking. An absolute positive sem:Bad can be defined by fixing a ranking. for instance sem:CommonPlace for rankings common to all people.

There is a long standing grammatical tradition starting from Aristotle's Categories that puts together a number of related notions under the rubric of antonyms (Aristotle's enantia "contrary"). Aristotle noted that antonymy is polysemous. Some cases are easy to do in OWL, like the case of relation converse (inverse) between say greater and less expressible by owl:inverseOf . Some may be less obvious depending on the coding, like the antonymy of good and bad under the choice function representation, which amounts to a owl:DisjointWith of classes Good and Bad relative to a given ranking. Or the antonymy between begin and end, which requires an analysis of the events to constituent states. The difference to just slapping on the traditional label is that the reasoner knows what to infer from the relation. For human consumption, the traditional label may be all one wants.

Many adjectives are derivatives of verbs and inherit semantic frame from them. Some are participles, like missing person meaning person who is missing. Some look just the same but are paronyms from event nouns, like dancing shoes meaning shoes for dancing.

Some adjectives come from nouns. A paronymous adjective like American from America can express generic genitive/possessive case sem:hasRole : the adjective American can mean the same as the genitive America's. The paronymy relation between American and America can be defined in TF like this:

<ont:American> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="&sem;relatedWith"/> <owl:hasValue rdf:resource="&ont;America"/> </owl:Restriction> </rdfs:subClassOf>

This says that the class American is a subclass of those things that are related to the country America. The TF top semantic property sem:relatedWith may also serve as the default semantics for the genitive case and the possessive verb have . Which subproperty of being related with is relevant depends on the terms of the relation.

The following exemplifies TermFactory conventions about names of languages, countries, and nationalities. The name of a language belongs to the expression namespace, and is the class instance (pun) of the corresponding class of expressions. The name of a country and its paronymous adjective are as explained above. Nationality subclasses the country adjective (classifies persons associated with the country by being its nationals). Language related meanings are distinguished from country related meanings by the namespace.

item concept term
the language (noun) exp:Finnish term:en-Finnish-N_-_exp-Finnish
belonging to the language (adjective) exp:Finnish term:en-Finnish-A_-_exp-Finnish
the country (noun) ont:Finland term:en-Finland-N_-_ont-Finland
belonging to the country (adjective) ont:Finnish term:en-Finnish-A_-_ont-Finnish
person of that nationality (adjective or noun) ont:FinnishNational term:en-Finnish-A_-_ont-FinnishNational, term:en-Finn-N_-_ont-FinnishNational

Adverbs

Adpositions are transitive (complemented) adverbs, including pre- and postpositions. A semantic case is an inverse of a semantic role. Pre- and postpositions and morphological cases in inflected languages are used tvo express semantic cases.

Polysemy

Polysemy is use of words in many related meanings. Less inflecting languages tend to be more polysemous than inflected ones, which prefer explicit paronymy with derivational affixes. In TF one can manage polysemy using an interlingual approach in small scale, so as to minimize the size of the semantic network (OWL graph).

Polysemy includes metonymy, regular or creative shift of meaning from one semantic entity to a different extensionally related one. It is different from homonymy, or the use of the same expression for unrelated meanings. Take as an example the use of the name of a country like Finland between animate (the people) and inanimate (the region), as in Finland fought Russia and Finland borders Russia . If we treat this as a homonymy, we distinguish two (individual) concepts ont:PeopleOfFinland and ont:RegionOfFinland , the former in class sem:Animate and the latter in sem:Inanimate , and two homomymous terms en-Finland-N_-_ont-PeopleOfFinland , en-Finland-N_-_ont-RegionOfFinland . If we do the same with all country names in all languages, we get a lot of homonyms.

Instead we can use an interlingual approach and treat this as a regular case of metonymy in the semantics. There is a language independent (individual) concept ont:Finland of type ont:Country . It is related by relations like ont:hasPeople and ont:hasRegion to the concepts PeopleOfFinland and RegionOfFinland , which belong to classes ont:People, sem:Animate and ont:Region, sem:Inanimate , respectively. There is only one term en-Finland-N_-_ont-Finland pointing to the country. The country is an extensional whole consisting of a region and a people (among other things perhaps). It may or may not make sense to class the whole ( ont:Finland, ont:Country ) as animate or inanimate, since it has parts in both classes. (But one may feel that countries are primarily regions.) Finland is small can be true or false depending on which class is used in the ranking. It is left to external processing to resolve the polysemy in context. Perhaps we need some indication to know which properties are acceptable metonymies.

For another example, Finnish distinguishes between a verb valmistaa 'prepare', event noun valmistus 'preparation' and product noun valmiste 'preparate'. Assume the event and the product need separating to different concepts, say PrepareEvent and PrepareProduct (the latter related to the former as productOf PrepareEvent ). We need three signs relating the three forms to the two concepts. The English family prepare/preparation/preparate is similar, except preparation has process/product polysemy. If this is a common occurrence, create a superclass Prepare relating the two senses PrepareEvent and PrepareProduct and make preparation denote that.

The English word parse can be a verb or a noun denoting the event of parsing. (This common type of ambiguity is known as conversion in English grammar). Instead of creating two signs for each part of speech and meaning, make just one form en_parse_NV designate one concept ont:Parse , leaving the polysemy to semantics. Split ont:Parse to ont:ParseEvent and ont:ParseProduct to accommodate Finnish jäsentää/jäsennys where the jäsentää is the verb and the jäsennys is a noun polysemous between event and product senses.

Resource names and addresses in TF

Terminology is about naming of resources in a special field. The problem of localizing and globalizing names and addresses is at the core of TermFactory. The global web resource identifiers (URLs, URIs, IRIs, URNs) are unambiguous, but they are also long, ugly, and hard to remember. TF bridges between global and local names.

Semantic Web orthodoxy, like terminology, requires that its resource identifiers are monosemous, viz. globally identify a unique resource. There is no converse requirement that entities associated to resource IDs should be mononymous, viz. have just one IRI associated to them. There would be no way to enforce it. For TF, this means that though a given TF IRI should identify just one terminology entity, there is nothing against one terminology entity having many IRIs aliased to it. This is a useful insight, because it allows a variety of different IRIs systematically associated to a given term or other entity depending on need. For some purposes (decentralised creation, search, or debugging), a IRI descriptive of the thing named is useful. For other purposes an encrypted or character-encoded version (data protection, encoding issues), or just a unique numeric URI (version maintenance), may be preferable.

The main reason why the choice of IRIs for expressions and terms is significant is what is sometimes termed the ontology hell . That is the situation (already present) where effectively the same concept is invented and reinvented many times over with different if only slightly varying resource name, and essentially the same meaning by many authors, who then face the problem of ontology matching to find if they are really talking about the same thing or not. The problem of homonymy or polysemy besetting natural language words is just replaced by an equally confusing problem of synonymy between globally unambiguous identifiers. The problem is known in terminology theory as the harmonisation problem between different terminology standards. This is an unavoidable result when creation of resource names is not centrally controlled by some hub or authority the way of web domain names. Since central control is not realistic, we must face decentralized creation of TF identifiers. The least we can do then is to try to agree on some standard of naming that helps relate competing proposals.

For some applications and resource types, it makes sense to maintain permanent arbitrary IDs for resources, so that descriptive properties can be changed without losing identity of the resource. But (it is hoped) terms are less subject to change, or they are always identified by their key properties rather than by name. Anyway, for human inspection of ontologies, for quick queries, or when transferring data across a variety of converters and editors, it helps to have less arbitrary, descriptive identifiers for entities. Numeric IDs are very easy to get wrong and the errors are hard to catch. Some ontology editors can show the user a node's label instead of or alongside its IRI, or let the user tell the editor how to construct a display label for a node from selected key properties. The TF factor utility also supports switching between alternative IRIs.

Identifiers vs. keys

The IRI of a resource (the value of the ID attribute rdf:about in XML/RDF) identifies it. More generally, an ID in RDF or OWL is an inverse functional (datatype) property. Resource (description)s that have the same ID are (describe) the same resource.

In everyday practice, natural language expressions and terms are not identified by ID, but by description, a set of key properties, such as language, string label, and some further category key to distinguish homonyms. Then terms or designations that share these keys are by definition identical.

The philosophical preference in TermFactory is to identify terms and designations by key properties rather than by name. Blank designations and terms identified by key properties are sustained by TF. Systematic identifying descriptive names (IRIs) for terms designations are defined and programmatically supported as a convenience. Automatic means for switching between keys and descriptive names are provided. This also simplifies the problem of TF ontology matching: two descriptions of a term or designation are identified if they share descriptive name or their key properties match.

OWL2 introduces a construct for class specific key properties owl:hasKey . It allows defining keys for a given class. A hasKey axiom states that each named instance of a class is uniquely identified by a (data or object) property or a set of properties - that is, if two named instances of the class coincide on values for each of key properties, then these two individuals are the same. The validity of the assumption that the properties of a descriptive identifier are really keys can be stated and tested using a reasoner.

Keys are a generalization of identifiers. Between identifiers and keys, there are clear differences but there is also a very subtle difference (see here):

  • owl:hasKey can be used to define "composite keys" (that is, keys that comprise several properties);
  • owl:hasKey is a characteristic of a class, while owl:InverseFunctionalProperty is a characteristic of an owl:ObjectProperty;
  • an owl:hasKey axiom can involve a mixture of object properties and datatype properties;
  • owl:hasKey can only infer equality if the key values are present explicitly.

A key axiom of the form HasKey( owl:Thing ( ID ) () ) is similar to the axiom InverseFunctionalObjectProperty( ID ), the main differences being that the former axiom is applicable only to individuals that are explicitly named in an ontology, while the latter axiom is also applicable to anonymous individuals and individuals whose existence is implied by existential quantification.

TermFactory schema document TFKeys.owl (a subset of TFStrict.owl) defines key properties for TF forms and signs as follows:

Show/hide TFKeys.owl

A common task is to merge term collections. When blank designations identified by key are merged, an OWL reasoner is needed to identify duplicates by comparing their keys. Here is an example, two identical terms that share referent and designation. When these documents are merged under RDF reasoning, everything appears in duplicate. TFKeys axioms and OWL reasoner allow merging the duplicate entries into one.

@prefix sign: <http://tfs.cc/sign/> . @prefix exp: <http://tfs.cc/exp/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix syn: <http://tfs.cc/syn/> . @prefix term: <http://tfs.cc/term/> . @prefix ont: <http://tfs.cc/ont/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix sem: <http://tfs.cc/sem/> . @prefix meta: <http://tfs.cc/meta/> . [ a term:Term ; term:hasDesignation [ a exp:Designation ; exp:baseForm "petollinen ystävä" ; exp:catCode "N" ; exp:langCode "fi" ; meta:source "avain3" ; ] ; term:hasReferent term:seeFalseFriend ; meta:source "avain1" ] . @prefix sign: <http://tfs.cc/sign/> . @prefix exp: <http://tfs.cc/exp/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix syn: <http://tfs.cc/syn/> . @prefix term: <http://tfs.cc/term/> . @prefix ont: <http://tfs.cc/ont/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix sem: <http://tfs.cc/sem/> . @prefix meta: <http://tfs.cc/meta/> . [ a term:Term ; term:hasDesignation [ a exp:Designation ; exp:baseForm "petollinen ystävä" ; exp:catCode "N" ; exp:langCode "fi" ; meta:source "avain4" ; ] ; term:hasReferent term:seeFalseFriend ; meta:source "avain2" ; ] .

The Pellet reasoner applies the hasKey axioms in TFKeys.owl to infer owl:sameAs statements that connect resources that have the same class and the same key attributes for the class, provided that the keys are explicit and pairwise identical. The TF Factor utility operation identify substitutes the identities so as to merge identical resources into one. In the above example documents, the designations already have the same class and keys explicitly, so the following query

tfquery -W=identify -Q='DESCRIBE ?term WHERE {?term a term:Term}' -e=Mixed -F avain3.ttl avain4.ttl ../owl/TFKeys.owl > avaimet.ttl

suffices to identify the designations, The identification of the designations is not directly propagated to the terms, so a subsequent query

tfquery -W=identify -Q='DESCRIBE ?term WHERE {?term a term:Term}' -e=Mixed -F avaimet.ttl ../owl/TFKeys.owl

is needed to identify the terms. The result:

@prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix meta: <http://tfs.cc/meta/> . term:seeFalseFriend a owl:Thing . [ a term:Term , owl:Thing ; meta:source "avain2" , "avain1" ; term:hasDesignation [ a exp:Designation , owl:Thing ; exp:baseForm "petollinen ystävä" ; exp:catCode "N" ; exp:langCode "fi" ; meta:source "avain4" , "avain3" ] ; term:hasReferent term:seeFalseFriend ] . <http://tfs.cc/owl/TFS.owl> owl:versionInfo "TF-Schema version 2.0 15.12.2010"^^rdf:PlainLiteral . Query result_size 15 http://localhost/TermFactory/query?&W=identify&Q=DESCRIBE+%3Fterm+WHERE+%7B%3Fterm+a+term%3ATerm%7D&e=Mixed&r=home%3A%2Fowl%2FTFKeys.owl&r=home%3A%2Fio%2Favaimet.ttl&Z=2014-04-28T10:16:41.912Z

Alternatively, Factor operation names can be used to give blank designations and terms descriptive names. Identically named entries get merged by RDF alone. Afterward, the descriptive names can be factored back into blanks and keys with Factor operation keys.

Descriptive names

If a term is strictly identified by a set of key properties, any mistake in any of those properties (say a simple typo in the label) identifes a different term. It is never possible to update an existing term, because updating a key feature in effect creates a new term. For maintenance of a terminological descriptions, we may want something that is a little more robust than key properties, but still transparent enough to make it easy to recognize likely duplicates. This is provided with a descriptive naming convention that exposes the similarity of competing conceptualisations.

TermFactory introduces a descriptive naming convention for designations and terms. It is not a norm, but a useful convention. It is ok to use other naming conventions, or leave designations and terms anonymous (blank).

The naming convention is this: the local name of an expression is formed from the expression's language code, base form, and category (usually, part of speech code is sufficient, but it can be another sufficiently distinguishing tag if not). Different authors should be able to arrive at the descriptive name of an item independently and be reasonably confident that another author adopting the same or similar descriptive name is after the same or a closely related notion. What we want to avoid as far as possible is the need of a separate global catalogue of resources.

The three parts of the local name are separated by hyphens. For example, the English language greeting hello in TFS (if it belonged to the TFS vocabulary) could get the descriptive name

syn:en-hello-S

Here S is a tag for the part of speech of a complete sentence/utterance and syn is a conventional prefix for the TFS form namespace http://tfs.cc/syn/ . Usual terminological conventions are to be followed: base (dictionary lemma) form if appropriate, number singular if available, no capitalization, no booleans (and/or/not), no metatext, no punctuation. The expression part should qualify as a search string for the expression without further parsing. It should not contain any notation belonging to metalanguage, such as parentheses, unless of course parentheses are really part of the designation. On the other hand, we must allow for variation in the name caused by standard encoding of extraneous characters in URIs (URLs/IRIs, as the case may be).

The descriptive label for a term or sign is formed by concatenating the descriptive label of its designation with some namespace prefix for its referent and the referent's local name, separated by hyphen. Since namespace prefixes are not globally registered, this naming convention is only suggestive of the referent of the term, and main purpose of the prefix is to serve as a suggestive distinguisher. The corresponding namespace is found on the referent of the term. The TF home site of the term (identified by the term's own namespace) maintains an index of the prefixes it uses for the purpose ( home:/etc/sparql/prefix.ttl ). The designation and referent parts of the term label are separated by the string _-_ . For example, say we have a concept (meaning) for greetings with name sem:Greeting . Then the sense of English hello as a greeting could be labeled descriptively as

sign:en-hello-S_-_sem-Greeting-1

Hyphen and underscore are used as the separators in the labels because they are the least reserved punctuation-like characters in the many Semantic Web character set conventions. These separator strings should not occur inside the parts they separate in a way that could cause ambiguity. Here sign is a conventional prefix for the sign namespace and sem is the prefix for the meaning namespace. The number at the end is an optional arbitrary sense distinguisher.

The descriptive naming convention represents a compromise between a human readable name and a perfect hash key . The idea is to choose key identifying properties of an expression (form) or term (sign) and form the descriptive name from them. The properties are supposed to be real keys in the relational database or OWL 2 hasKey sense , so that they uniquely identify the expression/term. Two different items should not end up with the same descriptive name, and optimally one item should get only one such name. The main brunt of identification is borne by the site URL prefix, so if two sites are to share a resource, the site prefix had better be the same. But even if not, and two sites happen to define the same object, the local part of the descriptive name should help identify the key features of the named resources for harmonisation.

Given descriptive names, a TF term ontology containing nothing but designations or terms carrying descriptive resource names can be a useful resource as such. It is already a well formed instance of a TF Label ontology. For searching and browsing, it may be enough to look for descriptive identifiers. If the descriptive name is properly constructed, the key properties of the resource can be directly read off it with a SPARQL 1.1 query.

Show/hide TF descriptive name

TF terms

There is no equally catholic naming convention for concepts in TF, for the simple reason that we expect concepts to come with predefined names from third party ontologies. But when the factor utility relabeler finds a concept without a name associated to a term that has an expression designating it, the relabeler tries to use that expression's base form to generate a camel cased name for the concept. For instance, a blank concept designated by an expression with base form 'concept without name' will get an IRI with local name ConceptWithoutName . This convention can be useful when converting third party word lists to ontologies. If namespace and representative camelscript synonym are not enough to identify a concept, the descriptive concept IRI can be suffixed with another freely chosen distinguisher, say part of speech and/or sense number:

sem-Greeting-noun-2

TFS has not got a descriptive labeling convention for IRIs of multilingual messages or their texts. A practical reason is that they will be too long. A principled justification is that texts have a weaker identity. They are less generally reused, are more susceptible to editing, have fewer properties, and are less likely to get accidentally reinvented by different authors. Altogether there is less motivation for them to carry their identities on their sleeves.

Phrases that fall somewhere between words and texts may also be too long to have unabbreviated descriptive identifiers. Terms in non-Latin alphabets can take twice or more space in bytes than in characters. As a compromise between a descriptive identifier and an arbitrary one, a descriptive identifier may be abbreviated by truncating the baseform string to fit the practical size limit plus a trailing dot and asterisk.

http://tfs.cc/icd10/term1/ru-Болезнь,_вызванная_ВИЧ,_с_проявлениями_других_злокачественных_новообразований_лимфатической,_кроветворной_и_родственных_им_тканей-N_-_icd10-B21.3

would be truncated to the Mediawiki title size limit of 255 bytes as

http://tfs.cc/icd10/term1/ru-Болезнь,_вызванная_ВИЧ,_с_проявлениями_других_злокачественных_новообразований_лимфатической,_кроветворной_и_род.*-N_-_icd10-B21.3

A practical benefit of descriptive IRIs is that they carry the key properties of the term on the IRI, which speeds up searches. A more long term advantage of using descriptive IRIs for TF expressions and terms is that such IRIs help avoid creating duplicate resource names when the creation of new resources is not centrally controlled. Namespaces avoid name collisions across sites, but inside a given namespace, contention between IRIs is at least more easily detected and remedied when the name is descriptive of the resource. A corresponding weakness of this convention is that terms which differ only insignificantly (say, by spelling variant) may get created alongside one another and need to be mapped as equal or related after the fact. This danger could be minimised by additional conventions on the choice of representative name. On the other hand, even with a slightly leaky convention, the similarity of close variants is less likely to go undetected.

A descriptive identifier becomes false to fact when a key property changes. Say there is a typo in a key property of a designation, or a new variant is introduced that should replace a preceding one, not just appear alongside it. It is not enough to remove the old variant from its home ontology. The deprecated variant may have established itself in other ontologies, which should in due course also switch to use the new one. The best the home ontology can do is to advertise the change and recommend an update. (This is quite analogous to deprecation of software interfaces.) The TF Factor utility connects a relabeled resource to its previous name with a owl:sameAs property. This is detailed in the chapter on workflows .

Generating descriptive names from key properties is supported by the TF factor utility. But the SPARQL 1.1 query language is also able to construct and parse TF descriptive names directly. The query home:/etc/scripts/construct-descriptive-exp-iri.sparql is an example. Conversely, if a term or expression has a well formed descriptive identifier, a SPARQL query may be able to generate corresponding triples from it. Consult home:/etc/scripts/construct-keys-from-iri.sparql for a sample. The following query parses descriptive names back into RDF triples:

pellet4tf query -F -q sparql/construct-exp-keys.sparql -F2 ../owl/tf-TFS.owl

The script makes use of a TF specific SPARQL extension function afn:match() which knows how to match a string with a regex pattern and extract the match.

The script home:/etc/scripts/construct-tfs-full-for-rdfs-labels.sparql generates a TFS Term ontology terms and designations with descriptive labels from RDF localization labels:

pellet4tf query -q home:/etc/scripts/construct-tfs-full-for-rdfs-labels.sparql home:/io/school.n3 home:/etc/prefix.ttl

Prefixes are not part of a RDF graph, only an abbreviatory device for serializing a graph as a document. The TermFactory reserved property meta:prefix allows coding prefix information in the RDF graph so that it becomes accessible to RDF graph processing. The input file prefix.ttl in the above query tells SPARQL which prefix to use for dbpedia-owl when constructing the descriptive term iri. The relevant contents are shown below.

@prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix meta: <http://tfs.cc/meta/> . dbpedia-owl: meta:prefix "dbp" . dbpedia: meta:prefix "dbp0" .

Character encoding in TF resource names

Besides genuine aliasing where different IRIs are used to point to the same TF entity, resource identifirs may need to get character escaped to conform to different standards. Non-Latin script support in today's Semantic Web tools is surprisingly weak and the character conventions in different SW standards and tools far from uniform ( Auer et al. 2010 ). In OWL 2.0, ontologies and their elements are identified using Internationalized Resource Identifiers (IRIs) [RFC3987]; while OWL 1.x uses Uniform Resource Identifiers (URIs). For some purposes, it is safer to use URL encoded (percent encoded) versions of non-ascii IRIs. But in general, now that IRIs are reasonably well supported by RDF applications, TermFactory resource identifiers had best be IRIs, using percent encoding only as necessary. The IRI reserved characters that need percent encoding in IRIs are listed in RFC 3987 as follows.

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

For instance, percent and dot are allowed in an IRI, but most other Latin punctuation is not, and need to be percent encoded. An absolute IRI has a scheme prefix. An IRI scheme prefix must start with a letter.

Universal resource locators (URL) were defined 1994 in Internet Engineering Task Group's request for comments RFC 1738 . Here, hostless absolute file urls had three slashes in a row.

Universal resource identifiers (URI) were described 1994 in RFC1630 . It was recommended that file urls should always mention host, but allowed localhost and even void host meaning localhost. RFC 2396 in 1998 separates net urls and file urls. As pointed out in a 1999 follow-up, RFC 2718 , double slash starting a naming authority was not allowed in file URLs whose authority (aka scheme-specific part) is empty. Java absolute file URLs start with home:// . The 2005 URI recommendation RFC 3986 is back to having three slashes for hostless absolute file urls.

Universal resource names (URN) were defined 1997 in RFC 2141 . According to RTF 3986, A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location"). The term "Uniform Resource Name" (URN) has been used historically to refer to both URIs under the "urn" scheme [RFC2141], which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable, and to any other URI with the properties of a name.

International resource identifiers (IRI) were defined 2005 in RFC 3987 . IRIs require the authority double slash though the authority part can be empty, so absolute hostless file urls have three slashes. The Jena IRI resolver (v. 2.9.4) complains if the authority slashes are missing.

To summarize:

  • RDF(S) 1.0 (in practice if not in theory) used URIs, RDF 1.1 to use IRIs
  • SPARQL supports IRIs,
  • OWL 1 used URIs and OWL 2 uses IRIs.

In the abstract RDF data model ( version 1.1 ), there are only IRIs. IRIs allow all characters beyond the US-ASCII charset. In some situations – notably HTTP retrieval – it is not allowed to transmit non-US-ASCII chars in the network identifier, so the IRI has to be converted to a URI using the process sketched in the note above and formally defined in RFC 3987.

RDF/XML and Turtle prefixed names have their own reserved character sets, which regrettably include the percent sign. In Turtle, using prefixed names is not a necessity, for IRIs can be used as such in angle brackets. But then one loses the abbreviatory power of prefixes. In Turtle prefixed names, reserved characters can be escaped as in Java with "\u" prefixed to a hex encoding of the Unicode codepoint. In RDF/XML, there is currently no way around character escaping, as property names can only be expressed by QNames in RDF/XML. TF3 encoding is an attempt at a least common denominator between different escaping requirements. It is UTF-8 percent encoding of the reserved characters of the IRI encoding with percent sign replaced by the unreserved (though potentially ambiguous) string "u00". TermFactory treats these different ways of escaping an IRI as equivalent, i.e. as the same resource name. (By the IRI standard alone, they are not equivalent.)

Absolute TermFactory URIs like http://tfs.cc/ont/Concept or http://localhost:8080/TermFactory/query?uri=http%3a%2f%2ftfs.cc%2f%ont%2fEnglish can contain a scheme (protocol) http:// , authority (host:port) localhost:8080 , path /concept , and query ?query=... . A fragment identifier after hash sign like #Concept at the end of a URI is not officially part of the URI, but just a URI reference. Such references are often used for ontology concept identifiers (though not in TF). A relative URI is a suffix of an absolute URI with scheme left out.

In general, resources should be created in the namespace of the most general ontology they belong to. Distributed ontology development is between the rock of ontology hell and the hard place of inconsistency. Technically, inconsistency is easier to negotiate than ontology matching. For this reason, URIs for common language (LGP) expressions are best shared. TermFactory sites should avoid inventing their own instances of common language expression resources. The TF factor utility supports this convention by creating descriptive URIs for expressions in the TFS namespace http://tfs.cc/exp/ . At the same time, a site that shares a resource from TF or elsewhere should not assert common properties of such shared before checking for consistency or redundancy against existing data. If it is not feasible to check consistency, for instance when converting third party content whose semantics is not clear, it may still be better to use a new namespace, so as to err on the side of redundancy instead of inconsistency. At least, the source of the suspect, redundant or conflicting data remains traceable.

Hash vs.slash vocabularies

Universal resource identifiers (URIs) globally identify a resource (they point to just one thing in the world), but not all of them are web addresses, or universal resource locators (URLs), identifying a web source holding a description of that resource.

A common convention for naming resources with URIS is to append the local name of a resource as a fragment identifier to a URL, separated by the cross-hatch or hash character #. By the Semantic Web addressing orthodoxy , this suggests that a resource URI should be described in a document obtained from the given URL at a location pointed to by the fragment identifier. Web servers only serve complete URL documents, fragments are dealt with at the client end. The fragment is not part of the request sent to the server. The server never sees it, hence cannot react to it in any way. A complete term ontology has to be downloaded to access the description of a fragment in it. This is not practical for TermFactory resources, and that is why TF prefers using slash uris for entry-size units.

The hash URI convention suggests that a URI of form http://host/path#resource should point to a location in an ontology document at URI http://host/path . In actual fact, this is not how it works most of the time. As things go, an ontology resource URI like http://host/path#resource may not resolve to any document in the websphere. If there is an ontology document at http://host/path at all, unless the document is HTML, the fragment identifer after the hash (#) will not single out anything in it.

Besides, it in general makes little sense to think of an ontology resource as a fragment of any one particular ontology document, since TF resources can be described at many locations in one document and in many different documents. A technical difficulty of the hash vocabulary convention is also that fragment identifiers are not part of the http url that gets sent between servers. A client can only ask for and receive a complete document from a server. Fragment identifiers are only meaningful for clients. (For discussion, see Cool URIs for the Semantic Web , RDF best practices , WordNet URIs , Jeni Tennison's blog , and Hebeler et al. 2009:58 ).

A better design decision for TF ontologies is to use resolvable URLs (known as slash vocabulary) for ontology resources to begin with. TermFactory's own vocabulary is a slash vocabulary. However fragment identifiers (hash vocabularies) are in general use, so TF had better have ways to handle them too.

URI abbreviations

There are a variety of ways of abbreviating URIs in ontology documents. For XML, URIs must be abbreviated in element and attribute names with namespace prefixes. An URI like http://tfs.cc/ont/Concept must be abbreviated with a prefix to something like ont:Concept in order to pass for an XML element name. (Such prefixed names are called qualified names, or QNames, in XML jargon.) Such prefixes can be invented ad hoc, and they remain ad hoc in the sense that there is no authority to maintain more than a handful of such prefixes globally. The prefixes currently in force can be declared at the root RDF element of an ontology RDF/XML file. Rewriting ontologies with ontology editors can unexpectedly change familiar prefixes.

Another abbreviatory device that can be used where prefixes are not allowed, (XML attribute value strings is one such place), are XML entity references like &tfs; for the URI prefix http://tfs.cc/ . Such entity references start with an ampersand and end with a colon. They are declared at the top of an XML file in a DOCTYPE element before root.

Beside these abbreviations, there are a few (perhaps somewhat riskier) tricks. An XML root attribute xmlns="http://tfs.cc/ont/" defines a so called empty namespace prefix, which is written in whenever an XML element appears without a prefix. Another abbreviatory attribute is xml:base which can be set to an URI prefix and is used to resolve (fill out) relative URIs, for instance orphan URI fragments #Concept in attribute values. Dealing with relative URIs is convenient, but not without risks. In general, it is safer to use abbreviatory devices that leave at least some local trace of what was left out.

TF3 encoding

The TF factor utility defines a least common denominator encoding specific to TF called TF3 encoding. TF3 encoding should only use characters in the intersection of XML QName, IRI and Turtle name non-reserved character sets, so it should survive both XML or Turtle QNames and URIs. On the minus side, a TF3 encoded string may be longer and less legible than the original. Also it is more ambiguous because all the characters are alphanumerics.

URL encoding aka percent encoding has a percent sign in front of a two-digit hex number. A character with more than two significant hex digits is first converted to its byte sequence in UTF-8. (An online converter can be found here .) Turtle (like Java) codes Unicode characters with prefix \u followed by the codepoint in four hex digits (using leading zeroes for shorter numbers). TF3 encoding consists of URL encoding with the percent sign replaced by the string u00 . Equivalently, it is Turtle encoding on top of UTF-8, without the beginning backslash. (Backslash is reserved in XML QNames.) The XML reserved characters and the dot character which is reserved in Turtle qualified names are always TF3 encoded (dot as u002e). Hyphen and undescore used as expression and term separators in TF descriptive URIs are unreserved word charaters for all concerned, so they are not encoded. For no good reason, at least until recently , Turtle local names used to exclude dots, while SPARQL allows '.'s in names in all positions apart from the first or last. Protege Turtle reader accepts dot in resource names, but not comma. It would be as well for Turtle not to reserve its punctuation characters, since they are conventionally separated by whitespace anyway.

For definiteness, we may stipulate that url encodings of characters in TF shall be in uppercase (%A0 not %a0). Size does not matter for url encoding, but case sensitive tools that do not interpret the encoding may mind. All the factor utility --tf3encode command does is percent encode the above mentioned problem characters and escape the percent character as u00. You must say --urlencode --tf3encode to do both (urlencode applies first) and --urldecode --tfdecode to undo both (urldecode applies last).

TF namespaces

Besides ontologies as manually prefabricated named subsets, TF uses the query engine to create and retrieve subsets by description. Query by description has slower response time than a prefabricated set, but the results can be fresher.

Punning

The activity of assigning properties to vocabulary items like classes and properties is known as metamodeling or punning in OWL jargon.

Unfortunately, OWL 1.0 DL requires individuals, classes and properties to be disjoint, so that punning is not allowed in OWL 1.0 DL. The only way allowed in OWL 1.0 DL to give properties to classes and individuals is with annotation properties. This is not good enough in that OWL excludes annotation properties from DL reasoning. Inferences between properties and their puns must therefore happen under RDF semantics outside OWL reasoning. As a matter of fact, many RDF and OWL reasoners seem to support querying annotations.) See Motik2007.

To conform with OWL 1.0 DL, TF invented a punning convention that duplicates class and property names with paronymous (derivative) individual representative names for instances, in order to be able to assign properties (such as terms) to classes through these representative instances. The naming convention is that if a repository's ontology namespace is foo , classes and properties are in that namespace. The class representative namespace is foo0 and the individual namespace is foo1 . The relation between the class ontology and the matching instance ontology in TF a systematic was originally made into a paronymy rather than full homonymy because OWL 1 DL does not allow punning.

In OWL 2.0, class/property-instance homonymy (called in ontology jargon punning or metamodeling) is allowed, so the class/property and instance name can be the same. Since direct punning of classes and properties is allowed, so the TF punning convention is no longer needed. The TF Schema drops it at TF version 3.9. The TF punning convention is documented here for backward compatibility.

The TF punning name convention is as follows. Assume ns:Foo stands for a general concept as an owl:Class. ns0:Foo names its first order pun, or metamodeling representative, from which to hang properties of the class. ns1:Foo stands for a bona fide individual member of a countable class. Singular or plural count entities, like individual players or teams, are in ns1 namespace. Abstract noncount entities, like names of concepts, languages or subject fields, are in ns0 namespace. Roughly, ns1 houses (more) concrete things and ns0 (more) abstract things. This differential treatment goes with the fact that countries form a partOf hierarchy (USA is a part of America), while languages or domains form a subclassOf hierarchy (American English is a subclass of English). Admittedly, rather arbitrary, but there it is.

The TFS prolog

The current TFS entity and namespace declarations in TFS.owl and TFTop.owl are defined in the schema prologs follows. The entity declarations are used to abbreviate namespaces in XML attribute value strings, the prefixes do the same in element and attribute names. (Different abbreviatory tricks are needed in different places thanks to XML syntax.)

Show/hide TF schema namespaces
legend
<!DOCTYPE rdf:RDF [ The entity declarations go in the DOCTYPE element.
<!ENTITY isocat "http://isocat.org#" > ISO data categories (in TF)
<!ENTITY tfs "http://tfs.cc/" > TF home
<!ENTITY meta "http://tfs.cc/meta/" > TF administrative classes and properties
<!ENTITY meta1 "http://tfs.cc/meta/" > metalanguage instances
<!ENTITY term "http://tfs.cc/term/" > term classes and properties
<!ENTITY ont "http://tfs.cc/ont/" > concept classes and properties
<!ENTITY exp "http://tfs.cc/exp/" > expression classes and properties
<!ENTITY sign "http://tfs.cc/sign/" > sign classes and properties
<!ENTITY syn "http://tfs.cc/syn/" > form classes and properties
<!ENTITY sem "http://tfs.cc/sem/" > meaning classes and properties
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" > xml schema datatypes
<!ENTITY owl "http://www.w3.org/2002/07/owl#" > owl namespace
<!ENTITY owl2xml "http://www.w3.org/2006/12/owl2-xml#" > owl xml namespace (not TF)
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" > rdf schema namespace (not TF)
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" > rdf namespace (not TF)
]>
<rdf:RDF The XML namespace prefix declarations are in the document root RDF element
xmlns="&tfs;owl/TFS.owl#" default namespace (not used)
xml:base="&tfs;/owl/TFS.owl" the xml:base element (only used for the ontology element below)
xmlns:tfs="http://tfs.cc/"
xmlns:meta="&tfs;meta/"
xmlns:exp="&tfs;exp/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl2xml="http://www.w3.org/2006/12/owl2-xml#"
xmlns:term="&tfs;term/"
xmlns:sign="&tfs;sign/"
xmlns:syn="&tfs;syn/"
xmlns:sem="&tfs;sem/"
xmlns:ont="&tfs;ont/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:isocat="http://isocat.org#">
<owl:Ontology rdf:about=""> The TFS ontology element. The rdf:about attribute value is resolved (filled in) from xml:base
<owl:versionInfo>TF-Schema version 2.0 15.12.2010</owl:versionInfo> current version in an owl:versionInfo element.
</owl:Ontology>

The xml:base attribute is used by ontology editors as a shortcut to find which ontology a document contains (to avoid having to parse and load the ontology and look for ontology resource descriptions inside), so it should be kept as is.

New names for items of each type in the same ontology become new URI fragments, the rest of the URI is kept unchanged. New ontology providers or TF sites have different scheme and path (the part identified by entity tfs above). The most fixed part of the naming convention are the part of the URI path just before the fragment which indicates the type of item named by the uri. Each site can choose URI prefixes at will. There is no way of enforcing conventions there as nice as it might be.

Namespace exp holds special language expression vocabulary, namespace ont concept related vocabulary, and namespace term term related vocabulary. Namespace syn is for general language grammar (both syntax and morphology, ala Peirce), sem is for TF general language semantics, and sign for signs (including word senses). Items in sem namespace support natural language processing such as automatic generation of definitions. The TF general language semantics is described in section on semantics . Special language concepts are under ont:Concept . Namespace meta is for shared administrative or metamodeling vocabulary.

Namespace prefixes are not fixed by the XML namespace standard, so the TFS prefixes are only a TF recommendation. Each repository may choose its own prefixes. The repository's favorite prefixes can be made public in the repository's ont-policy.rdf file and in home ontology documents.

TF property names follow the naming convention that object properties (those that take other resources as values) are named in converse pairs hasPropertyX and propertyXOf . Converse relationships must be declared explicitly (the naming convention does not do the job, it is just mnemonic). What is more, converses do not come about by the declaration only. Both directions must be asserted in the ontology, or a reasoner must be used to close the ontology under converses. For this reason, a TF terminology may not look much when viewed in an editor like Protege in the raw. To make the links implied by the schema visible, the ontology must first be classified by some reasoner. (Protege has Fact++ built in for this.)

Resources vs. literals

Any graph can have internal and leaf nodes. In OWL, a type distinction is made between resources and literals. Resources can have properties including identifiers (like URIs) associated to them. Literals have no properties, so they are always leaves. A resource's URI is (at least should be) the place where that resource is at home. The URI, and information associated to it, is (or should be) enough to identify the resource and distinguish it from other resources. A literal has no home base, its meaning depends on where it occurs. From the point of view of unambiguity, an URI might seem a perfect candidate for a standardised property picklist value. In practice, however, people prefer short natural language like identifiers. Property value picklists are usually literals formed of suggestive (English) keywords or abbreviations, such as the ISO standard language and country codes.

Given a property URI as context, the difference between a code and a URI as value from the point of view of unambiguity is technical. For example codes such as the ISO standard language or country codes can be literal picklist values of datatype properties. They get identified in TF uniquely as pairs of property URI and associated value. A property URI and value related as a URL and an URL fragment are is just as good an identifier for the value as a separate URI for the value alone.

One difference between literal picklists and picklist resources is how they are documented. URIs are be documented by their own TF entry. Picklist values are documented in a more roundabout way .

Another difference is that literal picklist values are not localized by TF. Only resources identified by a URI are subject to localization. Literals are literally literals, they are what they are. If a picklist property needs localizing, make the property an object property and its values TF resources.

In coding content to OWL one must decide what content to represent explicitly in OWL and what to leave implicit, for people or external processors to interpret. The decision depends on whether one plans to do OWL reasoning on the content. For instance, to express that an item can have either one of the part of speech codes N and V, perhaps the best option is just to allow multiple category properties. Otherwise, the join N|V might be defined explicitly as an OWL oneOf list. Then the join is no longer a literal but an enumerated class . A third aalternative is to just add N|V as an additional atomic part of speech code . Then OWL can tell nothing about the relation of the new code to its parts. That reasoning must be done elsewhere, e.g. in querying.

Object vs. datatype properties

The OWL type distinction between resources and literals also divides up OWL properties into object and datatype properties. (It is not possible to define an OWL property that subsumes both types of properties.) Object properties can have inverses, datatype properties not, because literals do not have properties.

TF follows the convention of naming a direct object property with a name of form hasSomeProperty and its inverse as SomePropertyOf . It is a matter of taste and convenience which direction is direct and which inverse. For instance, the inverse property term:referentOf is equivalent to the direct property term:hasTerm . Datatype properties have lowercase initial names not starting with has or ending with Of , like someAttribute . The convention in TF is that hasSomeProperty is a direct property and somePropertyOf is inverse, so that the one having the owl:inverseOf property is the inverse member of the pair.

Formatted text and the XMLLiteral datatype

TF string literals such as texts and baseforms can have datatype rdf:PlainLiteral by default. To indicate formatted text specially, property exp:text has a subproperty exp:textXML with object datatype rdf:XMLLiteral . This datatype can contain any well-formed XML, so exp:textXML can include formatting markup, mathematical formulas in MathML, even inline images in SVG (should one want them). Analogously, the exp:baseForm property of an expression has a subproperty exp:baseFormXML for thoses cases where formatting is really part of the baseform, for instance subscripts, trademarks or other markup. Comments too can contain formatting. The following is an example of Japanese written in Ruby (Rubi) characters in HTML annotation.

According to Section 2.8 XML Literals , XML literals are written in RDF/XML as content of a property element (not a property attribute) and indicated using the rdf:parseType="Literal" attribute on the containing property element.

<owl:Class rdf:about="&tfs;exp/Text"> <rdfs:subClassOf rdf:resource="&tfs;exp/Form"/> <rdfs:comment xml:lang="en" >Text reifies an XML parsable text string in some language (whatever the string means)</rdfs:comment> <rdfs:comment rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"><span xmlns="http://www.w3.org/1999/xhtml" class="value name" datatype="rdf:XMLLiteral" title="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">This is a <strong>formatted</strong> text string in some language (whatever the string <em>means</em>)</span> </rdfs:comment> </owl:Class>

The TF HTML writer is set up to parse an XMLLiteral in the XHTML namespace (an XML element having the namespace attribute xmlns="http://www.w3.org/1999/xhtml" as in the example above) into HTML, so it will display right in a browser. Conversely, the HTML parser will write such an element back into an XMLLiteral string as above. For example, the formatted XMLLiteral string above gets converted to this by the TF HTML writer.

  • rdfs:comment
    • This is aformattedtext string in some language (whatever the stringmeans)

rdf:XMLLiteral datatype was at risk in OWL 2, see http://www.w3.org/2007/OWL/wiki/At_Risk#rdf:XMLLiteral_.28Ongoing.29), but got included, see http://www.w3.org/2007/OWL/wiki/Quick_Reference_Guide.

Address redirection

The Semantic Web addressing orthodoxy is to serve a document from its URL. But documents at given URLs may not be always easily accessible to all, and certainly not editable by all. Often one needs to hold copies of a document obtained from a given URL. So we need a way to keep a mapping from the URL to the copy we want to access at a given time.

A whole host of solutions have been invented in the websphere for web address resolution, forwarding or URL redirection . Jena invented location mappings for remapping RDF documents and ontology policy files for remapping OWL ontologies. (Protege 3.1used ont-policy files.Protege 3.2had a home-grown repository mechanism.) XML provides XML catalogs. (Protege 4uses them.) In addition, client or server side scripts, web servers and application containers have URL mapping or rewriting facilities. Finally, domain names can get forwarded.

There are many ways of redirecting resource identifiers (URIs) and web addresses (URLs) in TF. One is specific for TF.

  • TF mappings
  • Jena ont-molicy documents
  • webpage forwarding with apache php
  • webpage forwarding with Tomcat servlets
  • Tomcat url rewriting
  • Apache url rewriting

TF addresses

The TF address space includes URLs, store descriptions, and their aliases.

A store description is a key-value string matching regular expression --(asm|name|params|pass|path|service|site|user)=\\S+(\\s+--(asm|name|params|pass|path|service|site|user)=\\S+)*. A store description describes a dataset, named graph, or sparql service.

A dataset description has form --asm=ASSEMBLY where ASSEMBLY is the address of a jena dataset assembly description document. It gets compiled into a dataset.

A graph description is of form --asm=ASSEMBLY --name=NAME. A graph description identifies a named graph in a dataset. It fqualifies for an address of a copy operation, and it can be included as a repository in the query dataset. A repository argument of form --named="--asm=ASSEMBLY --name=NAME" introduces the graph description as a named graph into the query dataset. This named graph is referred to in the query by the IRI ASSEMBLY:NAME.

A DAV dataset or graph description can also contain options --site=SITE --path=PATH --user=USER --pass=PASS indicating the dav site url, path to the collection, and user credentials. Missing values are defaulted from current TF conf.

A SPARQL protocol service description has form --service=ENDPOINT --params=PARAMS where --service indicates the service endpoint address and --params optionally supplies further parameters to the service request. These parameters can be SPARQL 1.1 protocol compliant ones like &default-graph-uri=URI or ones particular to a given service (like TF).

TF aliases

A TF alias is a free form shorthand text resolved to a TF address by the TF location mapper. The location mapper is a general finite state string rewriting mechanism that can factor any string at all into another string, not just web URLs. It serves to localize and globalize TF addresses much like the rest of TF serves to localize and globalize TF resource names. (Term 'alias' is used here indiscriminately for a TF mapping rule or a nickname defined by one.) Location mappings are kept in RDF files. There is a system wide mapping file (or map). Alternative maps can be written and applied by users.

TF alias mappings are modeled after Jena location mappings . They can be freely mixed with and kept in the same file as Jena location mappings. If there are Jena location mappings in the same file, the Jena mappings apply first. They work the usual Jena way, either succeed or fail for good. Except for the first conf file loaded at TF initialization, TF alias documents are read as TF ontology documents, using already available TF aliases. Because they are ontology documents, they can contain imports triples that import further location mappings.

TermFactory location mapping files can import other such files. A mapping can also be specified as a TF list file with suffix specified in TF_LIST_EXT (by default, .tsv ). Each line in a location mapping list file is an address of a location mapping file (including other location mapping list files).

TermFactory uses the tfget facility and location mappings to redirect ontology urls. Any convenient nickname, for instance alias foo can be set for http:tfs.cc/owl/TFS.owl with

[ tf:mapping [ tf:name "foo" ; tf:altName "http://tfs.cc/TFS.owl" ] .

and retrieved with a query string of form

http://tfs.cc/TermFactory/query?u=foo

The url to be sought is given here as value of option named u(rl) . What the TF query service does with a request of this form is try resolve the location of the address using TF aliases until a hit is found.

By default, TF alias mappings substitute names of TF configuration properties with their current values:

tfget -t dav:/home/TF_USER/foo http://localhost/dav/home/guest/foo
URL_encoding_in_location mappings

When a TF alias substitutes parameters in a TF webapp url querystring, the parameters might need to be URL (percent) encoded to protect URL reserved characters in it. To help writing such query aliases, this special dispensation is made for TF pattern mappings: if the input to a TF pattern mapping contains no question mark and the output does, the values of pattern variables of form ?<number> are url encoded. Normal pattern replacement variables of form $<number are not url encoded. No encoding happens if the input too contains a question mark. Example: query alias

[] tf:mapping [ tf:pattern "(.*)-lion[?]?(.*)" ; tf:altPattern "http://localhost/TermFactory/query?q=file%3Aetc%2Fsparql%2Flion.sparql&r=lion-?1$2" ] .

does the following encodings:

tfget -t "'quoted'-lion&f2=JSON" http://localhost/TermFactory/query?i=fi&q=file%3Aetc%2Fsparql%2Flion.sparql&a=1&r=tf-menus&r=lion-%27quoted%27&f2=JSON tfget -t "'quoted'-lion?&f2=JSON" http://localhost/TermFactory/query?q=file%3Aetc%2Fsparql%2Flion.sparql&r=lion-'quoted'&f2=JSON

In the first map query, the value of variable ?1 is url encoded, but that of $2 is not. In the second query, no url encoding happens, because there is a question mark in the input.

The mappings are given in a Jena RDF format location mapping file whose name is specified in the conf . For historical reasons, it is called home:/etc/location-mapping.n3 (but feel free to change the name). The TF configuration file, and the location mapping conf file, as well as other critical TF settings are stored in TF_HOME directory and accessible inside TF with TF_FILE urls of form home:/path that get resolved by TF against TF_HOME. A selection of TF configuration files under TF_HOME can be queried through the TermFactory services using the relative file url, others are hidden. As a rule, TF_HOME contents cannot be edited from the web. It is also possible to place selected configuration files in the Tomcat webapp directory, in which case they become editable on the web using the TermFactory webapp dav servlet.

Here is a sample TF location mapping configuration file.

## EXAMPLE @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix lm: <http://jena.hpl.hp.com/2004/08/location-mapping#> . @prefix tf: <http://tfs.cc/alias/> . # HTML settings [] a <http://tfs.cc/meta/Entry> ; <http://tfs.cc/meta/style> <home:/etc/confs/lm.properties> . # Application location to alternative location mappings. # # + Order does not matter. # + The location mapping parser looks for lm:mapping properties # and uses the object value so this can be written in several different styles. # # The translation algorithm is: # # 1 - Exact mappings: these are tried before attempting a prefix match. # 2 - By prefix: find the longest matching prefix # 3 - Use the original if no alternative. # Use N3's , (multiple objects => multiple statements of same subject and predicate) # Note the commas ## -- Example 1 ## [] lm:mapping ## [ lm:name "file:foo.n3" ; lm:altName "file:etc/foo.n3" ] , ## [ lm:prefix "file:etc/" ; lm:altPrefix "file:ETC/" ] , ## [ lm:name "file:etc/foo.n3" ; lm:altName "file:DIR/foo.n3" ] , ## . ## -- Example 2 # This is exactly the same graph using the ; syntax of N3 # Multiple statements with the same subject - and we used the same predicate. ## [] lm:mapping [ lm:name "file:foo.n3" ; lm:altName "file:etc/foo.n3" ] ; ## lm:mapping [ lm:prefix "file:etc/" ; lm:altPrefix "file:ETC/" ] ; ## lm:mapping [ lm:name "file:etc/foo.n3" ; lm:altName "file:DIR/foo.n3" ] ; ## . ## -- Example 3 # Different graph - same effect. The fact there are different subjects is immaterial. ## [] lm:mapping [ lm:name "file:foo.n3" ; lm:altName "file:etc/foo.n3" ] . ## [] lm:mapping [ lm:prefix "file:etc/" ; lm:altPrefix "file:ETC/" ] . ## [] lm:mapping [ lm:name "file:etc/foo.n3" ; lm:altName "file:DIR/foo.n3" ] . ## -- TF mappings # localhost overrides tfs.cc except for dav [] tf:mapping [ tf:prefix "http://tfs.cc" ; tf:altPrefix "http://localhost/tfs.cc" ] . # try dav copy first #[] tf:mapping [ tf:pattern "http://localhost/(([^d]|d[^a]|da[^v]).*)" ; tf:altPattern "+http://localhost/dav/localhost/$1" ] . #[] tf:mapping [ tf:prefix "file:///" ; tf:altPrefix "+http://localhost/dav/" ] . # then original #[] tf:mapping [ tf:prefix "+http://localhost/dav/localhost/" ; tf:altPrefix "+http://localhost/" ; ] . #[] tf:mapping [ tf:prefix "+http://localhost/dav/home/" ; tf:altPrefix "+file:///home/" ; ] . ## prefix conventions (home:/ built in for TF_HOME) [] tf:mapping [ tf:prefix "host:/" ; tf:altPrefix "http://localhost/" ] . # host: is mapped to local host document root [] tf:mapping [ tf:prefix "dav:/" ; tf:altPrefix "http://localhost/dav/home/TF_USER" ] . # dav: is mapped to local host dav user dir [] tf:mapping [ tf:prefix "app:/" ; tf:altPrefix "http://localhost/TermFactory/" ] . # TF: is mapped to local webapp root # lion: localization ontology prefix [] tf:mapping [ tf:name "lion" ; tf:altName "lion:" ] . # map prefix "lion+" to "lion-" [] tf:mapping [ tf:pattern "lion:(.*/)([^/]+)" ; tf:altPattern "$1lion-$2" ] . # map lion:foo/bar to lion ontology name foo/lion-bar ## prefix mappings [] tf:mapping [ tf:prefix "ont:" ; tf:altPrefix "http://tfs.cc/ont/" ] . [] tf:mapping [ tf:prefix "exp:" ; tf:altPrefix "http://tfs.cc/exp/" ] . # nicknames for styles [] tf:mapping [ tf:name "grid" ; tf:altName "home:/etc/confs/grid.properties" ] . [] tf:mapping [ tf:name "wnsem" ; tf:altName "home:/etc/confs/wnsem.properties" ] . [] tf:mapping [ tf:name "wnsyn" ; tf:altName "home:/etc/confs/wnsyn.properties" ] . [] tf:mapping [ tf:name "wnsign" ; tf:altName "home:/etc/confs/wnsign.properties" ] . [] tf:mapping [ tf:name "frame" ; tf:altName "home:/etc/confs/frame.properties" ] . [] tf:mapping [ tf:name "kkmax" ; tf:altName "home:/etc/confs/kkmax.properties" ] . # aliases for xsl stylesheets [] tf:mapping [ tf:name "tf2html2.xsl" ; tf:altName "home:/etc/skins/tf2html2.xsl" ] . [] tf:mapping [ tf:name "html2tf2.xsl" ; tf:altName "home:/etc/skins/html2tf2.xsl" ] . # nicknames for templates #[] tf:mapping [ tf:name "sem" ; tf:altName "home:/etc/templates/sem.ttl" ] . #[] tf:mapping [ tf:name "syn" ; tf:altName "home:/etc/templates/syn.ttl" ] . [] tf:mapping [ tf:name "ont" ; tf:altName "home:/etc/templates/ont.ttl" ] . [] tf:mapping [ tf:name "exp" ; tf:altName "home:/etc/templates/exp.ttl" ] . [] tf:mapping [ tf:name "term" ; tf:altName "home:/etc/templates/term.ttl" ] . [] tf:mapping [ tf:name "sem" ; tf:altName "home:/etc/templates/sem.ttl" ] . [] tf:mapping [ tf:name "syn" ; tf:altName "home:/etc/templates/syn.ttl" ] . [] tf:mapping [ tf:name "sign" ; tf:altName "home:/etc/templates/sign.ttl" ] . [] tf:mapping [ tf:name "lite" ; tf:altName "home:/etc/templates/lite.ttl" ] . [] tf:mapping [ tf:name "lm" ; tf:altName "home:/etc/templates/lm.ttl" ] . [] tf:mapping [ tf:name "df" ; tf:altName "home:/etc/templates/df.ttl" ] . [] tf:mapping [ tf:name "wf" ; tf:altName "home:/etc/templates/wf.ttl" ] . [] tf:mapping [ tf:name "gfc" ; tf:altName "home:/etc/templates/gfc.ttl" ] . [] tf:mapping [ tf:name "gft" ; tf:altName "home:/etc/templates/gft.ttl" ] . # nicknames for schemas [] tf:mapping [ tf:name "tfs" ; tf:altName "http://tfs.cc/owl/TFS.owl" ; rdfs:comment "TermFactory schema"@en, "Termitehtaan skeema"@fi, "Termfabrikens skema"@sv ] . [] tf:mapping [ tf:name "wns" ; tf:altName "home:/owl/wn/TFwn.owl" ] . [] tf:mapping [ tf:name "dbps" ; tf:altName "host:/owl/dbp/dbpedia_3.8.owl.rdf" ; rdfs:comment "DBPedia ontology schema"@en, "DBPedian ontologiaskeema"@fi, "DBPEdias ontologiskema"@sv ] . [] tf:mapping [ tf:name "frames" ; tf:altName "home:/owl/TFFrame.ttl" ; rdfs:comment "TermFactory lexical frames"@en, "Termitehtaan sanakehykset"@fi, "Termfabrikens ordramar"@sv ] . [] tf:mapping [ tf:name "lms" ; tf:altName "home:/etc/menus/lm/lms.ttl" ] . [] tf:mapping [ tf:name "top" ; tf:altName "home:/owl/TFTop.owl" ; rdfs:comment "TermFactory top ontology"@en, "Termitehtaan huippuontologia"@fi, "Termfabrikens toppontologi"@sv ] . # nicknames for prefix models [] tf:mapping [ tf:name "dbpp" ; tf:altName "home:/etc/sparql/dbp.ttl" ; rdfs:comment "DBPedia ontology schema"@en, "DBPedian ontologiaskeema"@fi, "DBPEdias ontologiskema"@sv ] . # aliases for localization ontologies #[] tf:mapping [ tf:name "lion-tfs" ; tf:altName "home:/owl/lion-TFS.owl" ; rdfs:comment "TF Schema localizations" ] . [] tf:mapping [ tf:name "lion-menu" ; tf:altName "home:/etc/menus/lion-menu.ttl" ; rdfs:comment "dialog items" ] . [] tf:mapping [ tf:name "lion-frames" ; tf:altName "home:/owl/TFFrame.ttl" ] . [] tf:mapping [ tf:name "lion-lms" ; tf:altName "home:/etc/menus/lm/lion-lms.ttl" ] . [ rdf:type tf:Alias ; tf:mapping [ tf:prefix "x-exp" ; tf:altPrefix "http://localhost/TermFactory/query?a=1&q=file%3aio%2fsparql%2fselect-exp-translations.sparql&r=http%3a%2f%2ftfs.cc%2fowl%2ftf-TFS.owl&f=TSV&i=exp" ] ] . # nicknames for localization collections [] tf:mapping [ tf:name "wnl" ; tf:altName "home:/owl/wn/" ] . [] tf:mapping [ tf:name "kkl" ; tf:altName "home:/cnv/kk/" ] . # nicknames for assemblies [] tf:mapping [ tf:name "dav" ; tf:altName "home:/etc/asm/dav.ttl" ] . [] tf:mapping [ tf:name "tdb" ; tf:altName "home:/etc/asm/tdb.ttl" ] . [] tf:mapping [ tf:name "text-tdb" ; tf:altName "home:/etc/asm/text-tdb.ttl" ] . [] tf:mapping [ tf:name "sdb" ; tf:altName "home:/etc/asm/sdb.ttl" ] . [] tf:mapping [ tf:name "dbp" ; tf:altName "home:/etc/asm/dbp.ttl" ] . [] tf:mapping [ tf:name "edit" ; tf:altName "home:/etc/asm/edit.ttl" ] . #[] tf:mapping [ tf:name "icd" ; tf:altName "home:/etc/asm/icd.ttl" ] . #[] tf:mapping [ tf:name "index" ; tf:altName "home:/etc/asm/index.ttl" ] . #[] tf:mapping [ tf:name "owlim" ; tf:altName "home:/etc/asm/owlim.ttl" ] . #[] tf:mapping [ tf:name "gflex" ; tf:altName "home:/etc/asm/gflex.ttl" ] . #[] tf:mapping [ tf:name "wordnet" ; tf:altName "home:/etc/asm/wordnet.ttl" ] . #[] tf:mapping [ tf:name "wnindex" ; tf:altName "home:/etc/asm/wnindex.ttl" ] . [] tf:mapping [ tf:name "dummy" ; tf:altName "home:/etc/asm/dummy.ttl" ] . # nicknames for stores [] tf:mapping [ tf:name "wnstore" ; tf:altName "--asm=tdb" ] . [] tf:mapping [ tf:name "icdstore" ; tf:altName "--asm=tdb" ] . # empty prefixes [] tf:mapping [ tf:prefix ":" ; tf:altPrefix "" ]. [] tf:mapping [ tf:prefix "lst:" ; tf:altPrefix "" ]. [] tf:mapping [ tf:prefix "from:" ; tf:altPrefix "" ]. # resource aliases [ rdf:type tf:Alias ; tf:mapping [ tf:prefix "http://tfs.cc/lang/" ; tf:altPrefix "http://localhost:8080/TermFactory/query?r=http%3a%2f%2ftfs.cc%2fowl%2flang%2fTFLang.owl&U=" ] ] . [ rdf:type tf:Alias ; tf:mapping [ tf:pattern "http://tfs.cc/ctry/(.*)" ; tf:altPattern "http://localhost:8080/TermFactory/query?uri=$1&repo=http%3a%2f%2ftfs.cc%2fowl%2fctry%2fTFCtry.owl" ] ] . [ rdf:type tf:Alias ; tf:mapping [ tf:pattern "idx:exp:(.*).html" ; tf:altPattern "http://localhost/TermFactory/query?D=idx%3Aexp1%3A$1.lst&r=idx%3Aexp2%3A$1.lst&f=HTML&T=exp&a=1" ] ] . [ rdf:type tf:Alias ; tf:mapping [ tf:pattern "idx:exp1:(.*).lst" ; tf:altPattern "http://localhost/TermFactory/query?i=$1&q=home:/etc/scripts/select-designations-by-iri-r.sparql&r=idx%2b&f=TSV&z=.lst" ] ] . [ rdf:type tf:Alias ; tf:mapping [ tf:pattern "idx:exp2:(.*).lst" ; tf:altPattern "http://localhost/TermFactory/query?i=$1&q=home:/etc/scripts/select-graphs-for-designations-by-iri-r.sparql&r=idx%2b&f=TSV&z=.lst" ] ] . # mediawiki ## Wordnet #[] tf:mapping [ tf:prefix "Wn30:" ; tf:altPrefix "http://localhost/tfs.cc/wn30/wn30entry.php?e=" ] . # php entry #[] tf:mapping [ tf:prefix "Wn30:word-" ; tf:altPrefix "http://localhost/TermFactory/query?r=wnstore&S=wns&T=syn&f=HTML&L=wnl&U=wn30:word-" ] . #[] tf:mapping [ tf:prefix "Wn30:synset-" ; tf:altPrefix "http://localhost/TermFactory/query?r=wnstore&S=wns&T=sem&f=HTML&L=wnl&U=wn30:synset-" ] . #[] tf:mapping [ tf:prefix "Wn30:wordsense-" ; tf:altPrefix "http://localhost/TermFactory/query?r=wnstore&S=wns&T=sign&f=HTML&L=wnl&U=wn30:wordsense-" ] . ### with styles and language tags [] tf:mapping [ tf:pattern "Wn30:word-([^&]*)(.*)" ; tf:altPattern "http://localhost/TermFactory/query?d=1&r=wnstore&Y=wnsyn&U=wn30:word-?1$2" ] . [] tf:mapping [ tf:pattern "Wn30:synset-([^&]*)(.*)" ; tf:altPattern "http://localhost/TermFactory/query?d=1&r=wnstore&Y=wnsem&U=wn30:synset-?1$2" ] . [] tf:mapping [ tf:pattern "Wn30:wordsense-([^&]*)(.*)" ; tf:altPattern "http://localhost/TermFactory/query?d=1&r=wnstore&Y=wnsign&U=wn30:wordsense-?1$2" ] . [] tf:mapping [ tf:prefix "wnont-" ; tf:altPrefix "http://localhost/TermFactory/query?r=wnstore&q=home:/etc/scripts/construct-wordnet-synset-entry.sparql&i=" ] . [] tf:mapping [ tf:prefix "wnhyponyms-" ; tf:altPrefix "http://localhost/TermFactory/query?f=TSV&r=wnindex%2b&q=home:/etc/scripts/select-wordnet-hyponyms.sparql&i=" ] . [] tf:mapping [ tf:prefix "wncat-" ; tf:altPrefix "wnont-lst:wnhyponyms-" ; tf:example "wncat-wn30:synset-food_fish-noun-1" ; rdfs:comment "hyponyms of given wordnet synset"@en, "annetun wordnet-synsetin alamerkitykset"@fi, "hyponymer till given wordnet synset"@sv ; ] . ## ICD-10 [] tf:mapping [ tf:prefix "http://tfs.cc/owl/icd10/" ; tf:altPrefix "http://localhost/owl/icd10/" ] . #[] tf:mapping [ tf:prefix "Icd10:" ; tf:altPrefix "http://localhost/TermFactory/query?r=icdstore&f=HTML&U=icd10ont0:" ] . #[] tf:mapping [ tf:prefix "Icd10exp1:" ; tf:altPrefix "http://localhost/TermFactory/query?r=icdstore&T=exp&f=HTML&U=http://tfs.cc/icd10/exp1/" ] . ### with language tags [] tf:mapping [ tf:pattern "Icd10:([^&]*)(.*)" ; tf:altPattern "http://localhost/TermFactory/query?d=2&r=icdstore&f=HTML&U=icd10:?1$2" ] . [] tf:mapping [ tf:pattern "Icd10exp:([^&]*)(.*)" ; tf:altPattern "http://localhost/TermFactory/query?d=2&r=icdstore&T=exp&f=HTML&U=http://tfs.cc/icd10/exp/?1$2" ] . [] tf:mapping [ tf:pattern "Icd10term:([^&]*)(.*)" ; tf:altPattern "http://localhost/TermFactory/query?d=2&r=icdstore&T=term&f=HTML&U=http://tfs.cc/icd10/term1/?1$2" ] . ### truncated ICD-10 iris [] tf:mapping [ tf:pattern "Icd10exp1:(.*[.][*].*)" ; tf:altPattern "Icd10exp1-:$1" ] . [] tf:mapping [ tf:prefix "Icd10exp1-:" ; tf:altPrefix "http://localhost/TermFactory/query?r=icdstore&T=exp&f=HTML&q=home:/etc/scripts/describe-iri-by-iri-r.sparql&i=http://tfs.cc/icd10/exp1/" ] . # query import alias [] tf:mapping [ tf:prefix "query?" ; tf:altPrefix "http://localhost:8080/TermFactory/query?" ] . # test relayed lm query #[] tf:mapping [ tf:name "foo" ; tf:altName "http://localhost:8080/TermFactory/query?url=bar" ] . #[] tf:mapping [ tf:name "bar" ; tf:altName "http://tfs.cc/owl/TFS.owl" ] . [] tf:mapping [ tf:name "bar" ; tf:altName "baz" ] . # menu generation aliases [] tf:mapping [ tf:prefix "gen-lion-" ; tf:altPrefix "http://localhost/TermFactory/query?f=JSONLD&a=1&u=lion-menu&" ; rdfs:comment "generate menu with given settings"@en, "luo valikot annetuilla asetuksilla"@fi, "skapa menuer med givna inställningar"@sv ; tf:example "gen-lion-u=home:/etc/confs/TF_USER.properties&o=webdav+etc/menus/conf/" ] . [] tf:mapping [ tf:prefix "gen-menu-" ; tf:altPrefix "http://localhost/TermFactory/query?f=JSONLD_FLAT&a=1&q=home%3A%2Fetc%2Fsparql%2Fmenu.sparql&" ; rdfs:comment "generate menu with given settings"@en, "luo valikko annetuilla asetuksilla"@fi, "skapa meny med givna inställningar"@sv ; tf:example "gen-menu-r=home:/etc/confs/TF_USER.properties&o=webdav+etc/menus/conf/" ] . # rewrite aliases [] tf:mapping [ tf:prefix "deblank-" ; tf:altPrefix "http://localhost/TermFactory/query?&W=deblank&u=" ] . [] tf:mapping [ tf:prefix "reblank-" ; tf:altPrefix "http://localhost/TermFactory/query?&W=reblank&u=" ] . [] tf:mapping [ tf:prefix "relabel-" ; tf:altPrefix "http://localhost/TermFactory/query?&W=relabel&u=" ] . # sparql service endpoints [] lm:mapping [ lm:name "termfactory" ; lm:altName "http://localhost/TermFactory/sparql" ] . [] lm:mapping [ lm:name "dbplive" ; lm:altName "http://dbpedia-live.openlinksw.com/sparql" ] . [] lm:mapping [ lm:name "factforge" ; lm:altName "http://factforge.net/sparql" ] . [] lm:mapping [ lm:name "museum" ; lm:altName "http://museum.ontotext.com/sparql" ] . [] lm:mapping [ lm:name "museum.rdf" ; lm:altName "http://museum.ontotext.com/sparql.rdf" ] . [] lm:mapping [ lm:name "dbpedia" ; lm:altName "http://dbpedia.org/sparql" ] . # gf [] tf:mapping [ tf:name "subjectlabels" ; rdfs:comment "labels for subjects by property and object"@en, "nimikkeet subjekteille ominaisuuden ja objektin mukaan"@fi, "subjektetiketter per egenskap of objekt"@sv ; tf:altName "home:/etc/scripts/construct-labels-by-property-and-object.sparql" ] . [] tf:mapping [ tf:pattern "dbpcat-(.*)" ; tf:example "dbpcat-Edible_fish" ; rdfs:comment "labels for resources in given dbpedia category"@en, "nimikkeet dbpedia-kategorian resursseille"@fi, "etiketter på resurser i given dbpedia-kategori"@sv ; tf:altPattern "http://localhost/TermFactory/query?p=dbpp&q=home%3A%2Fetc%2Fscripts%2Fcollect-labels-for-dbp-categories.sparql&i=dbpcat%3A?1&r=--service=dbpedia" ] . [] tf:mapping [ tf:prefix "namedentries-" ; tf:example "namedentries-dbpcat-Edible_fish" ; rdfs:comment "named terms for labels"@en, "nimetyt viennit nimikkeille"@fi, "namngivna poster per etikett"@sv ; tf:altPrefix "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Fconstruct-named-entries-for-labels.sparql&r=home:/etc/sparql/prefix.ttl&r=" ] . [] tf:mapping [ tf:prefix "blankentries-" ; tf:example "blankentries-dbpcat-Edible_fish" ; rdfs:comment "blank entries for labels"@en, "nimettömät viennit nimikkeille"@fi, "anonyma poster per etikett"@sv ; tf:altPrefix "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Fconstruct-entries-for-labels.sparql&r=home:/etc/sparql/prefix.ttl&r=" ] . [] tf:mapping [ tf:prefix "gframe-" ; tf:example "gframe-blankterms-dbpcat-Edible_fish" ; rdfs:comment "syntactic frames for entries"@en, "lauseopilliset kehykset vienneille"@fi, "syntaktiska ramar till artiklar"@sv ; tf:altPrefix "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Fconstruct-frame-properties.sparql&m=1&r=" ] . # molto [] tf:mapping [ tf:name "paintinglabels" ; rdfs:comment "labels for paintings from molto KRI endpoint"@en, "maalausten nimet molto KRI palvelusta"@fi ; tf:altName "http://localhost/TermFactory/query?f=TURTLE&q=home%3A%2Fetc%2Fscripts%2Fconstruct-painting-labels.sparql&r=http%3A%2F%2Fmuseum.ontotext.com%2Fsparql.rdf" ] . [] tf:mapping [ rdfs:comment "Finnish verbs starting with a"@en , "a:lla alkavat suomen wordnetin verbit"@fi ; tf:altName "http://localhost/TermFactory/query?f=TSV&q=http%3A%2F%2Flocalhost%2Fdav%2Fhome%2Fguest%2Fetc%2Fscripts%2Fselect-fi-wordnet-verbs-by-base-r.sparql&i=%5Ea&l=en&r=wnstore" ; tf:name "a-verbit" ] . [] tf:mapping [ rdfs:comment "syn frames for fi verbs starting with a"@en , "a:lla alkavien suomen wordnetin verbien kehykset"@fi, "syntaktiska ramar till finska wordnet-verb som börjar med a"@sv ; tf:altName "http://localhost/TermFactory/query?q=home:/io%2Fsparql%2Fconstruct-frames-for-fi-wordnet-verbs-by-base-r.sparql&i=%5Ea&l=en&r=wnstore" ; tf:name "a-verbikehykset" ] . # mobster [] tf:mapping [ tf:prefix "wnsanelut-" ; rdfs:comment "word senses for base form pattern in wordnet"@en, "perusmuotohahmoa vastaavia sananmerkityksiä wordnetissä"@fi, "ordbetydelser i wordnet per basformsmönster"@sv ; tf:example "wnsanelut-sepelvaltimotauti" ; tf:altPrefix "http://localhost/TermFactory/query?f=HTML&q=home%3A%2Fetc%2Fscripts%2Fselect-wordnet-senses-by-base-i.sparql&r=wnstore&i=" ] . [] tf:mapping [ tf:prefix "icdsanelut-" ; rdfs:comment "terms for base form pattern in ICD-10"@en, "perusmuotohahmoa vastaavia termejä ICD-10:sså"@fi, "ICD-10 -termer per basformsmönster"@sv ; tf:example "icdsanelut-sepelvaltimo" ; tf:altPrefix "http://localhost/TermFactory/query?f=HTML&q=home%3A%2Fetc%2Fscripts%2Fselect-terms-by-base-i.sparql&r=icdstore&i=" ] . [] tf:mapping [ tf:prefix "http://tfs.cc/icd10/term1/" ; tf:altPrefix "Icd10term1:" ] . # for TermFactory inserts queries 01.12.13 [] tf:mapping [ tf:prefix "/TermFactory/" ; tf:altPrefix "http://localhost/TermFactory/" ] . # whitespace allowed in alias 03.01.14 [] tf:mapping [ tf:prefix "this is a test" ; tf:altPrefix "home:/io/test.lst" ] . # nicknames for datastore scripts [] tf:mapping [ tf:name "table" ; tf:altName "home:/etc/scripts/select-triples.sparql" ] . [] tf:mapping [ tf:name "graph" ; tf:altName "home:/etc/scripts/collect-triples.sparql" ] . [] tf:mapping [ tf:name "count" ; tf:altName "home:/etc/scripts/count-triples.sparql" ] . [] tf:mapping [ tf:name "list" ; tf:altName "home:/etc/scripts/list-named-graphs.sparql" ] . [] tf:mapping [ tf:name "lint" ; tf:altName "home:/etc/scripts/list-nonempty-named-graphs.sparql" ] . [] tf:mapping [ tf:name "load" ; tf:altName "home:/etc/scripts/load-graph.sparql" ] . [] tf:mapping [ tf:name "drop" ; tf:altName "home:/etc/scripts/drop-graph.sparql" ] . # boilerplate get [] tf:mapping [ tf:prefix "bot" ; tf:altPrefix "http://localhost/TermFactory/query?u=URL&I=URL" ; rdfs:comment "boilerplate get"@en, "toistokaava"@fi ; tf:example "http://localhost/TermFactory/query?u=bot&i=tfs" ] . # assembly request aliases (second arg is regex) [] tf:mapping [ tf:prefix "list " ; tf:altPrefix "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Flist-named-graphs.sparql&r=--asm%3D" ] . [] tf:mapping [ tf:prefix "lint " ; tf:altPrefix "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Flist-nonempty-named-graphs.sparql&r=--asm%3D" ] . [] tf:mapping [ tf:pattern "list (.*) (.*)" ; tf:altPattern "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Flist-named-graphs.sparql&r=--asm%3D?1&i=?2" ] . [] tf:mapping [ tf:pattern "lint (.*) (.*)" ; tf:altPattern "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Flist-nonempty-named-graphs.sparql&r=--asm%3D?1&i=?2" ] . [] tf:mapping [ tf:prefix "graph " ; tf:altPrefix "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Fcollect-triples.sparql&r=--asm%3D" ] . [] tf:mapping [ tf:pattern "table (.*) (.*)" ; tf:altPattern "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Fxcollect-triples-from-graph-by-name.sparql&r=--asm%3D?1&i=?2" ] . [] tf:mapping [ tf:pattern "drop (.*) (.*)" ; tf:altPattern "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Fdrop-graph.sparql&r=--asm%3D?1&i=?2" ] . [] tf:mapping [ tf:pattern "load (.*) (.*) (.*)" ; tf:altPattern "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Fload-graph.sparql&r=--asm%3D?1&i=?2+?3" ] . [] tf:mapping [ tf:pattern "replace (.*) (.*) (.*)" ; tf:altPattern "http://localhost/TermFactory/query?q=home%3A%2Fetc%2Fscripts%2Freplace-graph.sparql&r=--asm%3D?1&i=?2+?3" ] . # assembly prefix aliases [] tf:mapping [ tf:prefix "tdb=" ; tf:altPrefix "--asm=tdb --name=" ] . [] tf:mapping [ tf:prefix "dav=" ; tf:altPrefix "--asm=dav --name=" ] .

TF aliases are used by the TF tfget facility. The first mapping above is a name mapping that maps a complete URI complete URI to another. The other examples above are prefix mappings. Prefix mappings apply to urls whose prefix matches the tf:prefix and they replace this prefix with the tf:altPrefix .

Location mappings can also be used to abbreviate namespaces. A location mapping facility is just a string prefix map, so any string prefix can be mapped to any other string prefix using it. The example maps the namespace prefix ont: to the URI slash namespace prefix http://tfs.cc/ont/ .

Rules with tf:pattern are TF specific regular expression pattern matching rules. They can be used to define rules with pluggable slots. If the pattern matches the uri (as a whole), it is replaced by the alternative pattern. If more than one pattern fits, the pattern with the longest match wins out. Note that pattern rules can do anything prefix rules can (and much more). The relation is straightforward:

## prefix rule [] tf:mapping [ tf:prefix "ont:" ; tf:altPrefix "http://tfs.cc/ont/" ] . ## pattern rule [] tf:mapping [ tf:pattern "ont:(.*)" ; tf:altPattern "http://tfs.cc/ont/$1" ] .

On the other hand, Jena style prefix rules are easier to write, their interactions are more predictable, and one does not have to worry about regular expression syntax.

The TF location mapper follows the jena location mapping algorithm (map before lookup, longest match first) to prefer the longest matching rule. If there are alternative rewrites for the same url, it only tries the best match. The null length prefix match is always the last choice. If there are tied rules for the same prefix, the last one wins.

The example and language tagged comments provide help with usage and examples in the TF query form. A mapping works just the same without them.

NOTE: A TF name/address like foo is not a valid URI, for a URI must contain a colon. In places where a third party tool (for instance a RDF database or a query engine) requires a URI, choose a TF name/address that contains colon, e.g. :foo .

TF plus prefixes

If an address contains a scheme part matching regular expression ^[^?+].*[+] (a possibly empty prefix not containing query or plus, followed by plus), the suffix after the plus is first location mapped. If it resolves to a URL, the URL is looked up.

If the suffix fails to resolve, the whole prefixed address is location mapped. TF plus prefixes thus provide a way to try alternative locations for a URL if the document at the primary location is not available. TF Get implements a restricted exclusive disjunction or if-then-else in location mappings. There is no backtracking, but mappings can define a deterministic decision tree. Get retries TF aliases until it succeeds, runs out of mappings, or exceeds the maximum number of hops (redirects) set in TF_HOPS.

For example, the empty plus prefix + in tfget +URL tries first to connect to URL, and if that fails tries to location map +URL.

Consider the following example mappings.

foo -> bar -> foo foo -> +bar -> +foo foo -> +foo -> bar

The first example is a redirection loop and resolves to null. The second example tries first bar, then foo. The third example first tries foo, then bar. For instance, given the mappings

foo ==> +dav:foo ==> +foo

tfget will first try dav:foo. If dav:foo is not located, it will try foo. (Compare to the loop example above.)

One must be careful when using the tfget facility to move between different versions of the same document. Depending on connections, different documents may get returned from the same address on different tries. This may be useful for mirroring static data, but during editing for instance, we do not want to accidentally revert to an older version, edit that and overwrite the latest one that just happened to be temporarily unavailable.

Query parameters format and encoding are recognised, but they are not passed through location mappings to subsequent query engine calls. Their format and encoding are controlled by location mappings. For instance, the query pattern url in the lm:mapping above might specify query?format=HTML&encoding=TF-16 .) If format and encoding are not specified in the query string, a TF query service engine uses defaults set in the conf ; failing that, compile time defaults.

Query parameter Q allows inserting the query text in the query uri.

The TF tfget facility constitutes a rudimentary RESTful style implementation of a TermFactory repository network . Location mappings can be used to redirect a query for a given uri to another TF instance.

TF home addresses

A TF home address is an address that starts with the TF_FILE prefix home:/ . The TF location mapper resolves the TF_FILE prefix with the value of $TF_HOME. This hardcoded location mapping helps express locations relative to a site's TF_HOME without exposing the file URL. The TF_FILE prefix home:/ can be configured to something else in TF properties.

TermFactory exposes certain filesystem directories through the Get facility. Currently, the list is this. The list is hardcoded in TFProperties.java.

public static final String TF_HOLES = "home:/etc/ont-policy.rdf,home:/etc/location-mapping.n3,home:/etc/sparql,home:/etc/confs,home:/etc/aliases,home:/etc/queries,home:/etc/scripts,home:/etc/menus,home:/etc/skins,home:/etc/assemblies,home:/etc/templates,home:/owl,home:/io,home:/gf,home:/log";

Some TermFactory resource files collected in the $TF_HOME/etc directory may need to be web accessible at some designated public url. Here is one convention for doing so. Assume a Tomcat server holding the TF web services also serves the TermFactory webapp. This is true of the current TermFactory home server http://tfs.cc and is the likely default in other TF installations. (If not, just use some other suitable target for the mappings.) Put whatever resources need publishing under the TermFactory webapp url in the TermFactory webapp's root directory $CATALINA_HOME/webapps/TermFactory/ .

The TermFactory root server is at http://tfs.cc . The ontologies that it maintains are at http://tfs.cc/owl/ . In general, the default place for the ontologies held by a TermFactory site http://site is http://site/owl/ . Localization files for an ontology may be elsewhere, at a site that needs to provide a given localization language for an ontology. A site may mirror another site locally and map the remote site's urls to (more) local ones in its location map file.

Prefix alias home:/ is built to point to the value of TF_HOME.

Another built-in TF scheme prefix is ibid:/ , which can be used in imports triples to tell the TF imports loader to resolve the address of an imported ontology against the address of the importing ontology. For example, the following triples

<http://tfs.cc/owl/foo.owl> owl:imports <ibid:/bar.owl> [] owl:imports <ibid:/bar.owl> .

tell the TF ontology loader to resolve bar.owl relative to the subject of the imports clause, or if the subject is blank, relative to the url from which the current graph is being loaded.

It may be also useful to have a conventional site relative location mappings for some locations. The following conventional prefix aliases are defined in the default location-mapping.n3 :

## prefix alias conventions (home:/ built in for TF_HOME) [] tf:mapping [ tf:prefix "host:/" ; tf:altPrefix "http://localhost/" ] . # host: is mapped to local host document root [] tf:mapping [ tf:prefix "dav:/" ; tf:altPrefix "http://localhost/dav/" ] . # dav: is mapped to local host dav root [] tf:mapping [ tf:prefix "app:/" ; tf:altPrefix "http://localhost/TermFactory/" ] . # TF: is mapped to local webapp root

For TF localization ontologies (that translate concepts in a given ontology), the TF naming convention is to hyphenate an ISO two or three letter lanuguage code prefix to the name of the ontology, for example fi-TFS.owl for the Finnish localization of the TFS schema. A localization ontology containing language independent codes and symbols is prefixed with cc as in cc-TFS.owl . The (nonreserved) language code cc is used for TermFactory language indpendent codes. A multilingual localization file is named all-TFS (with different schema profiles as above).

According to type of file, use the following naming convention:

content example
ontology name/alias TFS
unspecified format fi-TFS
unspecified language lion-TFS

The localization ontology for a given ontology alias like TFS can be located relative to the resolved address of the alias with TF plus prefixes as shown below. The plus prefix lion is mapped to lion:. The suffix is location mapped and then the resolved prefix (if any) is tacked back to the resolved address and the combination is mapped. The second mapping rule moves the prefix lion- next to the local name of the ontology it localizes, producing lion+TFS ==> lion::http://tfs.cc/owl/TFS.owl ==> http://tfs.cc/owl/lion-TFS.owl.

# lion: localization ontology prefix [] tf:mapping [ tf:name "lion" ; tf:altName "lion:" ] . # map prefix "lion+" to "lion:" [] tf:mapping [ tf:pattern "(.*)::(.*/)([^/]+)" ; tf:altPattern "$2$1-$3" ] . # map lion::foo/bar to lion ontology name foo/lion-bar
Defining aliases in the webapp

The Copy section of the TermFactory query form supports quick and dirty insertion of new aliases in the current TF_MAP using the Alias button. The real name or address goes in the "from" field and the abbreviation in the "to" field. The type of abbreviation is a name alias. No warning is given against duplicate aliases. For more sophistication, use the TermFactory editor on TF_MAP.

apache2 URL rewriting

This section describes how to use apache2 URL rewriting to access a TF tfget request for a TF concept like http://tfs.cc/exp/English , that is, a simple url like the first one below can abbreviate a long one like the second.

http://localhost/exp/English http://localhost:8080/TermFactory/query?uri=http%3A%2F%2Ftfs.cc%2Font%2fEnglish

It is enough to turn on the apache rewrite2 module and insert one rewrite rule. Assuming an out-of-the-box apache2 installation, the following addition to the default virtual host definition file etc/apache2/sites-enabled/default should do the job. (The ellipsis stands for the pre-existing content of the file. You need to restart apache2 after the changes.)

<VirtualHost *:80> ... # to rewrite incoming urls as calls to TermFactory query service #turn rewrite engine on RewriteEngine on # log rewrites to error.log (optional) RewriteLog /var/log/apache2/error.log # log level (optional, the default is 0 meaning no logging) RewriteLogLevel 9 # redirect a uri of form exp/English to TF query uri RewriteRule /(.*)/(.*)$ http://localhost:8080/TermFactory/query?url=http%3A%2F%2Ftfs.cc/$1%23$2 [B] </VirtualHost>

Note that we cannot map from the original ontology uri containing a fragment identifer using url rewriting. The optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a http GET request. That is to say, a http client does not send the fragment to the server, it requests the whole url and only looks for the fragment in the response at the client end. Since the fragment is not part of the GET http request, apache2 mod_rewrite cannot see or capture the fragment in the redirection match. For apache server end redirection to work, fragment identifiers must get rewritten into resolvable URLs at the client end. In the simplest case, all it takes is a replace of # with / in a TF uri before asking for it from a repository. These two forms of URI are then treated as aliases for the same resource in TF. (For discussion see here .)

A client side workaround to make hash vocabulary point to entry pages instead of document locations is this. Place the following index.html file at the directory indicated by the URI before the hash. The index file lists all the resources with that uri prefix naming them as anchor locations. Then use client side javascript to redirect the hash locations to the corresponding entry files when the index file is loaded.

<html> <head> <title>Concept instance index</title> <script type="text/javascript"> function ShowHash() { // alert("fragment ID is " + document.location.hash); if (document.location.hash) { page = document.location.hash.substring(1); if (page) window.location = page; } } </script> </head> <body onload="ShowHash()"> <h1>Concept instance index</h1> <ul> <li> <a href="ctryCode>ctryCode</a> </li> </ul> </body> </html>

The XML Resource Directory Language RDDL proposes a more general approach to this problem. RDDL is an extension of HTML designed to allow both human readers and software robots to find any sort of resource associated with a particular namespace. Instead of putting one thing at the end of a namespace URI, RDDL puts a document there that lists all the machine-processable documents that might be available. An RDDL document identifies each related resource by a resource element in the http://www.rddl.org/ namespace, which is customarily mapped to the rddl prefix. This element is a simple XLink (that is, it has an xlink:type attribute with the value simple) and its xlink:href attribute points to the related resource. Furthermore, the xlink:role attribute identifies the nature of the related resource and the optional xlink:arcrole attribute identifies the purpose of the related resource. An optional xlink:title attribute can provide a brief description of the purpose of the link. RDDL is not related to GRDDL , specs for reading RDF off of X(HT)ML documents using XSLT.

Tomcat URL rewriting

To make a Tomcat web application redirect URLs it receives, the Tuckey rewrite filter can be installed as instructed here.

When properly installed, the rewrite rules can be inspected at localhost address http://localhost:8080/TermFactory/rewrite-status.

Reasoning about terms

Much of the complexity of TF reflects the complexity of distributed management of large ontologies in the Web. The tactics is one of divide and conquer: divide large ontologies into manageable size pieces, manage redundancy using reasoning, inheritance, redirection, and other forms of sharing. The approach to ontology work taken in TF emulates Jaakko Hintikka's idea of "small models" in modal logic: instead of working with complete ontologies ("possible worlds"), make it easy to extract and merge working subsets from larger ontologies ("model sets").

The verbosity of RDF or OWL vocabularies goes hand in hand with reasoners that are able to add and remove the redundancy on demand. The advantage of the verbosity is the same as in natural language: each context of use can get by with fewer statements given a vocabulary that fits its needs. As an example, take the properties and their subproperties. A property can be specialised to a subproperty that is equivalent to a superproperty within its domain or range. So why not just use the superproperty? Because the subproperty is an alternative way to encode the type of the subject and object. Using the subproperty, the type triple is redundant. Instead of three words 'go by car', use one verb, 'drive'.

Ontology reasoning

Reasoning can reduce the traveling size of an ontology by letting it imply much more than it asserts. One can find for a given TF term ontology a set of axioms from which all statements of the ontology can be derived, and from which no axiom can be removed without losing statements in the theory. Conversely, many useful types of reasoning in terminological practice add redundancy that facilitates query, term layout and class/property inheritance. An axiom set for a term ontology is useful to have for editing and versioning because the set is minimal and edits only need to be done in one place. A redundant set of entailments can be useful for viewing and querying because it allows faster access to implicit content than online reasoning - provided the closure is not prohibitively large.

An axiom set is not generally unique for a given theory; there can be many minimal axiom sets and many of the same smallest size. At the verbose end, there is the notion of materialization , or inferential closure of the ontology, which contains all the (non-tautological) statements (in the ontology's vocabulary) derivable from the axiom set. This is unique for an ontology, but not necessarily finite even if the ontology is decidable. The closure of a TF term ontology is a finite (though possibly large) set. proof?

Total materialisation is adopted as the storage strategy in a number of popular Semantic Web repositories, including some of the standard configurations of Sesame and Jena. Based on publicly available evaluation data, it is also the only strategy which allows scalable reasoning in the range of a billon of triples; such results are published by BBN (for DAML DB) and ORACLE (for RDF support in ORACLE 11g). ( Ontotext on reasoning strategies.) Query and retrieval are fast, because no deduction, satisfiability checking, or other sorts of reasoning are required. The evaluation of the queries becomes computationally comparable to the same task for relation database management systems (RDBMS).On the downside, upload/store/addition of new facts is relatively slow, because the repository is extending the inferred closure after each transaction for modification. In fact, all the reasoning is performed during the upload. Deletion of facts is also slow, because the repository should remove from the inferred closure all the facts which are not true any longer. The maintenance of the inferred closure usually requires considerable additional space (RAM, disk, or both, depending on the implementation. Owlim Primer 2009 )

Editing and reasoning

Subject to reasoning, a TF is not just a set of syntactic objects (statements or triples), but a set of facts and rules. Editing, on the other hand, is primarily a syntactic operation. Triples, not facts, are edited. Given reasoning, a fact may stay in an ontology although a triple stating it gets deleted, if it follows from some other axioms that remain. Editing facts and rules is not just more complex, but brings in much-studied but tricky problems of nonmonotone reasoning or belief revision. The TF user manual makes some suggestions about how to tackle this.

Adding or deleting triples about named resources would seem straightforward enough, but reasoning causes complications already here. Reasoning causes redundancy and redundancy creates ambiguities. Deleting a theorem does not get rid of the axiom, and a reasoner will generate the deleted theorem from it again. Did the user mean to get rid of the axiom too? Which axioms? For instance, if the user deletes a > b, she may expect to get rid of b < a as well. But what to do about a > c . c > b? Redundant data is laborious and error prone to update, as is well known from database theory. It would be ideal to have a nonredundant axiomatisation of an ontology to edit. But nonredundant normal forms are not easy to come by, see admin section on normal forms .

One case of such redundancy concerns inverses. The TF HTML writer may use hardcoded knowledge about inverses in the TFS schema plus inverse relationships from a schema given as the schema option when it writes an entry. To compensate, the TF editor deletes the same inverses of properties marked for deletion.

Editing literals causes another kind of ambiguity. By the RDF standard, an literal is identified by its literal form (a string), datatype, and language (if datatype string). When literals are edited by hand, users may not provide datatype and languaged information. To compensate, the TF editor applies a loose equality condition on literals. Literal triples are added or deleted under string equality, ignoring datatype and language. In addition, text strings are whitespace normalized (initial and final spaces are trimmed and multiple internal spaces simplified to one) before comparing.

Editing anonymous resources (blanks) is tricky. Matching blanks involves RDF (or OWL) reasoning. Blanks express graph-scoped existential quantification over nodes. This is why blank nodes cannot be identified between graphs. A blank triple added to the same graph twice becomes duplicated until some reasoning is applied. It takes a reasoner to decide whether the active ontology "contains" a given blank triple in the edits (meaning: entails it in the sense of RDF entailment). Even then, the identification need not be unique, and we cannot tell for sure just what the result of the editing should be. Blank matching involves graph isomorphism, which is a NP complete problem. Editing normally happens under syntactic identity, and blanks have not got syntactic identity under RDF semantics.

Logically, blanks are existential variables whose scope is the graph at hand. A blank triple x a b equals (Ex) x a b & G where G is the blank node closure of x, or the smallest graph containing x closed under free variables. That is the smallest existentially closed graph containing x in the model. Deleting x a b should not falsify (Ex) x a b, only (Ex) x a b & G . One way to put it is that the atomic units of information (and editing) in a RDF model are its minimal closed graphs: constant triples and blank node closures.

A simple way to edit blanks safely is to factor blanks into URIs for the time of the editing. From a logical point of view, the de-blank factor does an existential instantiation (skolemization) of the model. During editing blank URIs behave as named resources and cause no problems. After editing, the blank URIs are written back into blank nodes. The re-blank factor does existential generalization. The HTML reader/writer and the Factor utility support such factors. Cf this post.

Skolemisation works as long as the blanks all come from the same graph. In general, however, the editable triples may come from some bigger datastore, which needs to be updated with the results of the edit, Then there is no way to cross-identify the blanks, since the contents of the big datastore is technically another graph. There is no way out but try to match blanks across models. From above, that means matching blank node closures. TF does such matching with SPARQL Update. An example update request is shown below.

PREFIX : <http://koe/> DELETE { ?b1 :long :path . } INSERT { ?b1 :wide :path . } WHERE { ?b2 :a ?b1 . :this :is ?b2 . ?b1 :long :path . FILTER ( isBlank(?b1) && isBlank(?b2) ) . }

This request edits the depth first graph shown earlier to the width first graph shown next to it. The only difference between them is the property :wide in place of :long, which occasions the DELETE of the :long property and the INSERT of the :wide property in the above update query. The trick is to describe the configuration of blank triples where the substitution it to take place, shown in the WHERE clause.

In general, when committing a deletion containing blanks to an ontology, care must be taken lest the deletion query match too much and delete more than the editor intended. If one wants to delete just a particular triple, the query pattern should contain its existential closure in the ontology. TF DESCRIBE queries which apply blank node closure to the resources they describe can be of help here.

The TF editor has a rudimentary blank node matcher that matches blank nodes across models under subgraph isomorphism. The matcher is used in the following ways. First, a blank triple in the edits is marked editable against a given active just when its blank node closure is matched with one in the active model. Second, when a model is being subtracted from or added to another, blank triples are matched the same way. (Deprecated.)

Identifiers and more generally key properties can reduce the size of a safely editable subgraph. An identifier is a property that guarantees uniqueness (an inverse functional property). A key is a set of properties of a resource that together uniquely identify the resource. A graph all of whose blanks are either closed or unique in the graph can be safely edited. A simple example of an identifier is an owl:sameAs property identifying a blank with a named resource. Another example is the set of key properties identifying a designation or a term. Call a subgraph that only contains closed or identified blanks a unique closure (cf. inverse functional concise bounded description and RDF molecules).

The TermFactory describe selector uses the following criteria to determine if a given node is unique in the result set. When a node is unique, it need not be described further to make sure that the description is safely editable.

  • it is literal
  • it has a name (uri)
  • it is owl:sameAs some unique node
  • it is a function of some unique node (object of an owl:FunctionalProperty or subject of an owl:InverseFunctionalProperty)
  • it belongs to a class that has keys (owl:hasKey), and all its keys are unique

The last two items depend on a schema ontology given to the query with option schema.

Here is a simple example. Say we know that David is Mary's ancestor, but the intervening forefathers are anonymous. Without schema information, the DESCRIBE query on Mary must list the blank node closure including her whole ancestry up to David, though the description depth is one. Given schema model begat.ttl that contains the triple :begat a owl:InverseFunctionalProperty ., the result set dwindles to one triple. If it is known that Mary has just one father, there is no need to describe deeper for an editable subset.

tfquery -d=1 -f=TURTLE -U=:Mary -F david.ttl <urn:David> <urn:begat> [ <urn:begat> [ <urn:begat> [ <urn:begat> [ <urn:begat> <urn:Mary> ] ] ] ] . Query result_size 5 http://localhost/TermFactory/query?&f=TURTLE&U=%3AMary&d=1&r=file%3Aio%2Fdavid.ttl&Z=2014-03-11T12:00:48.415Z tfquery -d=1 -f=TURTLE -U=:Mary -S=home:/io/begat.ttl -F david.ttl [] <urn:begat> <urn:Mary> . Query result_size 1 http://localhost/TermFactory/query?&S=file%3Aio%2Fbegat.ttl&f=TURTLE&U=%3AMary&d=1&r=file%3Aio%2Fdavid.ttl&Z=2014-03-11T11:59:48.347Z

Another example illustrating uniqueness based on an owl:hasKey axiom. The first query below grabs the whole connected graph of blank nodes. The second query, schema keys.ttl has a key axiom exp:Designation owl:hasKey (exp:baseForm exp:langCode). Then the TF DESCRIBE query can safely select a smaller editable subset for a blank that has unique keys.

tfquery -f=TURTLE -Q='describe ?b where { ?b exp:baseForm "koe" ; exp:langCode "fi" }' -F blex.ttl @prefix exp: <http://tfs.cc/exp/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . [] exp:baseForm "koe" ; exp:langCode "fi" ; <urn:see> [ exp:baseForm "test" ; exp:langCode "en" ; <urn:see> [ exp:baseForm "Versuch" ; exp:langCode "de" ] ] . Query result_size 8 http://localhost/TermFactory/query?&f=TURTLE&Q=describe+%3Fb+where+%7B+%3Fb+exp%3AbaseForm+%22koe%22+%3B+exp%3AlangCode+%22fi%22+%7D&r=file%3Aio%2Fblex.ttl&Z=2014-03-12T05:28:14.805Z tfquery -d=1 -S=home:/io/keys.ttl -f=TURTLE -Q='describe ?b where { ?b exp:baseForm "koe" ; exp:langCode "fi" }' -F blex.ttl @prefix exp: <http://tfs.cc/exp/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . [] exp:baseForm "koe" ; exp:langCode "fi" ; <urn:see> [] . Query result_size 3 http://localhost/TermFactory/query?&S=file%3Aio%2Fkeys.ttl&f=TURTLE&Q=describe+%3Fb+where+%7B+%3Fb+exp%3AbaseForm+%22koe%22+%3B+exp%3AlangCode+%22fi%22+%7D&d=1&r=file%3Aio%2Fblex.ttl&Z=2014-03-12T05:27:15.557Z

The selector does not separately check that the blank is a Designation, it just checks whether the keys are present and unique.

Property inheritance in OWL

One of the attractions of ontologies is inheritance. Instead of having to assign properties to each instance (say a term or expression) individually, one assigns shared properties to classes, whence they get inherited to class instances. In description logic, inheritance becomes classical logical entailment. OWL allows expressing some property inheritance by means of defining classes by description.

Take as an example the ontology of languages and language codes in TF. Class exp:Language is the type of concepts (puns) for individual languages like exp:English. exp:English is the type of all pieces of the English language, including English terms and expressions, that have language code "en". The language code "en" in turn is the base form of an expression cc-en-N , the designation of the term cc-en-N_-_exp-English that refers to English. Terms and expressions that belong to English should have the language code "en". Having "en" as language code means being English. Intuitively, then, the first triple below should be equivalent to the second:

[] exp:langCode "en" . [] rdf:type exp:English .

Indeed, this equivalence can be expressed in OWL as follows.

<owl:Class rdf:about="&exp;English"> <owl:equivalentClass> <owl:Restriction> <owl:onProperty rdf:resource="&exp;langCode"/> <owl:hasValue rdf:datatype="&xsd;string">en</owl:hasValue> </owl:Restriction> </owl:equivalentClass> <rdfs:subClassOf rdf:resource="&exp;Language"/> <rdfs:comment rdf:datatype="&xsd;string">anything in the English language</rdfs:comment> </owl:Class>

Here the element owl:Restriction defines a class by description, namely, the class of those things which have language code 'en'. Any instance that belongs to this class has that language code. Or that is what the axiom says. It is up to a reasoner to enforce this classification. (OWL restrictions tend to look ugly, so it may be convenient to invent a class whose main job is to entail restrictions. Then the restrictions are easily imposed by just subclassing from that class. exp:English above is an example - its instances inherit language code 'en'.)

A theory (a set of statements) in two-variable first order logic with equality is decidable for satisfiability (Mortimer). A canonical model of a theory is a maximal consistent extension of the theory which constitutes a model for the theory (satisfies the theory). A theory is (finitely) satisfiable if has a (finite) canonical model. Not all ontologies have finite models. OWL allows expressing ontologies which have only infinite models (though not arbitrarily complex ones; the inference problem stays decidable, i.e. finite.) Transitive functional properties can cause this. So can two independent transitive properties. A language (schema) all of whose theories (instances) are finitely satisfiable is said to have finite model property (fmp). Finite model property implies decidability. A TF ontology consists of a concept ontology and a term ontology. A concept ontology can be any OWL theory, but a TF term ontology is finitely satisfiable: the TFS schema has no existential axioms which could lose fmp. A TFS.owl term ontology has a finite inferential closure (materialization). Its size relative to the axiom set depends on the constructs used in the term ontology. (For a survey of description logic complexity results see Zolin .)

Here is an example of a sample expression ontology as an axiom set and its theorems. The theorems were generated with pellet extract . The closure of is the axioms plus the theorems. Theorems can be checked against the axiom set with with pellet entail . The TF tool pellet4tf can be used to extend an ontology with entailments relative to a given set of axioms.

Show/hide TF axioms

Show/hide TF theorems

Rules

Different varieties of rules have been proposed as extensions of description logic. Besides extending classical reasoning power past the confines of decidable description logic, some types of rules can do nonmonotone reasoning, signature transformation and factoring (rewriting the vocabulary of resource names and literals). Because rules can express more, they are less well behaved than descriptions. For the same reason, there is less agreement about what rules should be like. There is no W3C recommendation for a rule language, just a W3C recommendation for a rule interchange format RIF . A well known Semantic Web rule language is SWRL. SWRL is partially supported by Pellet .

Property inheritance with SWRL

SWRL is a w3c submission for a Semantic Web rule language. The proposal extends the set of OWL axioms with if-then rules. It thus enables such rules to be included in an OWL knowledge base. The proposed rules are of the form of an implication between an antecedent (body) and consequent (head). The intended meaning can be read as: whenever the conditions specified in the body hold, then the conditions specified in the head must also hold.

The property inheritance method of the previous section covers inheritance of properties from a class to its instances. For inheritance of properties over other properties, rules are needed. The following rule (in ad hoc human readable form) expresses a source inheritance rule: the sources of a term include the sources of its designation.

@prefix : <http://tfs.cc/swrl/ @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix swrl: <http://www.w3.org/2003/11/swrl#> . @prefix term: <http://tfs.cc/term/> . @prefix meta: <http://tfs.cc/meta/> . :SourceRule a swrl:Imp; swrl:body ( [ a swrl:IndividualPropertyAtom; swrl:argument1 :x; swrl:argument2 :y; swrl:propertyPredicate term:hasDesignation ] [ a swrl:IndividualPropertyAtom; swrl:argument1 :y; swrl:argument2 :z; swrl:propertyPredicate meta:hasSource ] ); swrl:head ( [ a swrl:IndividualPropertyAtom; swrl:argument1 :x; swrl:argument1 :z; swrl:propertyPredicate :meta:hasSource ] ) .

Pellet supports reasoning with SWRL rules. A file that contains SWRL rules is loaded into Pellet and rules are parsed and processed. SWRL rules can be mixed with OWL axioms in an ontology. Rules will be applied to only named individuals in the ontology. Pellet supports all SWRL atoms described in the specification. SWRL builtins include among other things string processing primitives (substring search, concatenation etc.), so it is possible to create URIs with rules. As of Pellet 2.0, Pellet supports nearly all of the SWRL builtins . An alternative to SWRL rules for this purpose is SPARQL 1.1. function extensions.

Role maps

OWL 1 can express inheritance of classes: property slots and properties with fixed value. It cannot express inheritance of property triples between individuals (that takes three variables: if x has property z then y has z). For instance, conventionally, in TBX the scope of a source indication depends on its place in the tree. The interpretive convention is that the source indication gets inherited down the XML tree from the entry to terms in it. OWL 1 cannot make this inheritance convention explicit. It has to be expressed in other ways, for instance with queries or rules. (Compare section on rules and discussion .)

SWRL is undecidable since it can encode arbitrary role maps. Role maps are inclusions between role chains, for example a father's brother is an uncle . Role maps as a whole are a well-known undecidable class. Inheriting a property (e.g. hasSource) along another property (e.g. designationOf) can be formulated as a role map: hasSource includes designationOf hasSource. It turns out that with appropriate restrictions, this particular case is tractable. Decidability can be preserved by restricting expressivity to acyclic role inclusion axioms. These are sufficient for expressing property inheritance. An OWL 2 construct (complex property inclusion with an ObjectPropertyChain in a SubObjectPropertyOf axiom) codes this. See Horrocks/Sattler 2002 . Compare also OWL 2 Rules and this presentation .

The following sample SWRL in Turtle format exemplifies the use of SWRL rules to express the inheritance of source indications from expressions to terms:

Show/hide TF rules

The rule makes the example term inherit the source indication of its designation, as show by the following pellet test run. The clou of the example is that term en-example-N_-_exp:Example has inherited a source from expression en-example-N .

pellet realize TFRules.owl owl:Thing exp:Expression - (exp1:en-example-N) meta:HasSource - (en-example-N_-_exp:Example, exp1:en-example-N) meta:Source - (meta1:User) term:Term - (en-example-N_-_exp:Example)

OWL 2 allows defining compositions of object properties as follows:

<!-- http://tfs.cc/term/designates --> <owl:ObjectProperty rdf:about="&term;designates"> <owl:propertyChainAxiom rdf:parseType="Collection"> <rdf:Description rdf:about="&term;designationOf"/> <rdf:Description rdf:about="&term;hasReferent"/> </owl:propertyChainAxiom> </owl:ObjectProperty>

The relation term:designatedBy holds between a concept and an expression when there is a term whose referent is the concept and whose designation is the expression. The same technique can be used to define property inheritance, for instance, the property meta:hasSource may be made to include the composition term:hasDefinition o meta:hasSource .

TF supports ontology transformation using SPARQL CONSTRUCT and UPDATE queries. The TF factor utility can do factoring of TF IRIs.

Querying as reasoning

A query language expresses questions about a dataset (set of graphs/models). The answer may be yes/no, a list of bindings of answer values to variables (question words) in the query, or another set of triples. The answer to a query is computed by a query engine. Querying is one if not the main way of accessing knowledge stored in a TF repository. The Web of Data concept builds on a network of linked data consisting of machine readable query endpoints parallel to the current network of human readable websites.

In Jaakko Hintikka's logic of questions, an answer to a question is set of formulas in epistemic logic that entails the question, expressed as another epistemic formula. Answers we won and we did not win to question did we win (or not) . The logic is that the answer formula (I know that) we won entails the question formula I know that we won or I know that we did not win . Since the answer matches the question, it can be abbreviated to yes or no.

Question words are quantifiers into an epistemic context. I know what something is when there is something I know it to be. When the form of the answer reflects that of the question, an answer to a question can be abbreviated to a list of (bindings of) values to question words. So one can answer Who won WW2? with USA, Britain, ... , instead of USA won WW2 and Britain won WW2 and ... . This type of answer to a query constitutes a set of bindings of values to question variables to produce instances of the query in the data. But note that Hintikka's definition of answerhood also covers indirect answers, where inference intervenes between the question and the answer, or between the answer and the dataset.

In query languages, question variables are distinguished from the rest in some way depending on the query language. Query languages may have imperative variants as well, i.e. they don't just answer questions but change things on the basis of the answer to a query.

Many query languages for RDF and OWL have been proposed, including RDQL, SERQL and SPARQL for RDF or OWL-QL, SPARQL-DL and SAIQL for OWL. SPARQL is the de facto standard. The TF query engine uses SPARQL

SPARQL

SPARQL is the best known RDF query language and a w3c recommendation. Its syntax resembles that of the well established relational database query language SQL, which may be one of its selling points. The trend has for some time been for RDF as against OWL. SPARQL has become richer, offering services expected from reasoners with syntactic enrichments. The Semantic Web is wilting, but the Web of Data is fighting on.

SPARQL has been under fast development, and many new features have been introduced to version 1.1. It includes SPARQL Update , an extension of SPARQL for committing updates to rdf datastores. SPARQL 1.1 also contains a specification for federated queries with SPARQL. SPARQL is increasingly able to carry out jobs that motivated some special purpose TF query and edit tools. The TF query engine uses SPARQL Update to implement RDF editing.

SPARQL query answering is based on the semantics of RDF. A query is answered by a set of subgraphs of the dataset that match (by graph isomorphism) the query graph. With this semantics, SPARQL only finds triples that are explicitly listed in the dataset. There is a provision in the SPARQL standard to extend SPARQL query answering to other entailment regimes beyond subgraph matching . Given such extensions, SPARQL can be used to extract entailments from a dataset.

Let us make a try to understand how SPARQL works from an example. The following query makes a table of concepts in the TF schema TFS.owl and their translations in Finnish from the localization vocabulary tf-TFS.owl. Lines starting with a hash are comment lines that are ignored by the query engine. Lines starting with PREFIX introduce abbreviations for namespace strings. The SELECT clauses defines the result set. Here it is a table whose lines consist of pairs of concept instances (?inst) and expressions (?exp). Items starting with a question mark or a dollar sign are variables that the query engine fills in from the data. The FROM NAMED lines load the specified ontologies to the dataset as named models. Without NAMED, the loaded ontologies get merged into the default model and the provenance of individual triples is forgotten. The WHERE section is a graph pattern to match against the data. GRAPH wrappers restrict the triples to match to given named models. OPTIONAL triples are optional: if present, included, if absent, skipped. ORDER BY sorts the result set in ascending alphabetic order first by concept instance then by expression.

Used in combination with LIMIT and OFFSET, ORDER BY can be used to return results generated from a different slice of the solution sequence. In other words, ORDER BY applies before OFFSET and LIMIT. This can make the query very slow if the result set without LIMIT is large: the whole set is sorted before OFFSET and LIMIT are applied.

# select-concepts-with-lion-fi-from-tfs.sparql #en table of TF schema concepts with their Finnish localizations #fi taulukko TT skeeman käsitteistä suomennoksineen PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX ont: <http://tfs.cc/ont/> PREFIX term: <http://tfs.cc/term/> PREFIX meta: <http://tfs.cc/meta/> PREFIX exp: <http://tfs.cc/exp/> SELECT DISTINCT ?inst ?exp FROM NAMED <http://tfs.cc/owl/TFS.owl> FROM NAMED <http://tfs.cc/owl/fi-TFS.owl> WHERE { GRAPH <http://tfs.cc/owl/TFS.owl> { ?inst rdf:type ont:Concept . } OPTIONAL { GRAPH <http://tfs.cc/owl/fi-TFS.owl> { ?term term:hasReferent ?inst . ?term term:hasDesignation ?exp . ?exp exp:langCode "fi" . } } } ORDER BY ASC(?inst) ASC(?exp)
SPARQL 1.1

The TF SPARQL query engine is SPARQL 1.1. compliant.

A SPARQL query FROM clause address must be an absolute IRI (have a valid scheme prefix), for example FROM <iri:foo> . The TF facilities automatically add the dummy prefix iri: in front of names that have not got a scheme prefix. The argument of a FROM clause must resolve to graph (not dataset).

According to the SPARQL standard , a sparql query only provides answers which simply entail the query. A query graph matches an data graph if the data graph entails the query. Blanks count as existentially bound variables. This makes writing sparql queries on data containing blanks a little tricky. In particular, one cannot change properties on an anonymous node using sparql query, because the result of the query will not share blanks with the data. This problem can be circumventedby temporarily converting blanks in the data to named resources using the factor utility for the duration of a query and then back to blanks afterward. The factor deblank/deblank options do just that.

The TF SPARQL query engine with pellet OWL reasoner is not complete with respect to first order semantics. For instance, the pellet MIXED query engine answers yes to query ASK { exp:Bar rdf:type term:Designation } against the dataset below, so it is able to apply the definition of term:Designation. On the other hand, ASK { exp:Baz term:designationOf _:something } is answered no, because the reasoner does not do existential instantiation.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix ont: <http://tfs.cc/ont/> . @prefix term: <http://tfs.cc/term/> . @prefix meta: <http://tfs.cc/meta/> . @prefix exp: <http://tfs.cc/exp/> . term:Designation owl:equivalentClass [ rdf:type owl:Restriction ; owl:onProperty term:designationOf ; owl:someValuesFrom owl:Thing ] . exp:Bar term:designationOf exp:Foo . exp:Baz rdf:type term:Designation .

Realization in DL reasoning finds the most specific classes that an individual belongs to; i.e., realization computes the direct types for each individual. Realization can only be performed after classification since direct types are defined with respect to a class hierarchy. Using the classification hierarchy, it is also possible to get all the types for each individual.

Realization is typically a slow undertaking, since it in effect multiplies the number of instances with the number of classes. It usually does not make sense to ask TF DESCRIBE queries using the Mixed engine on larger ontologies, since the query can easily take more time than one cares to wait. One solution is to do realization on a large ontology once in a while and cache the results for direct consultation. Another way to go is to apply the TF Stacked engine.

Answer formatting in SPARQL is restricted. The answer of a SPARQL query can be yes/no (ASK queries), a list of bindings as text, xml, or json (SELECT), or a set of RDF triples constructed from the answer bindings and constant strings (CONSTRUCT). The result set can be filtered with functions that express syntactic conditions on resource names and literals. But there is no way to construct new resource names and literals from the answer bindings. This restricts the use of SPARQL as a RDF graph transformation tool (but see SPARQL-U or SPARUL). Such facilities are included in Semantic Web rule languages .

Ontology imports

Real world ontologies, for instance in the biomedical domain or in the Linked Data initiative, can become quite big. We are talking of hundreds of thousands, millions, or billions of entities. At that scale, ontologies become unmanageable unless some kind of divide and conquer tactics is available. (Desktop ontology editors can have difficulties with tens of thousands of items, not to speak of us human users who get lost with hundreds). There is a growing literature on ontology sharing and importing, ontology decomposition, modularity etc. On the level of the OWL standard, there is as yet little support for all this. The only primitive in the standard is the owl:imports element that creates import relationships between ontology resources. In terms of RDF graphs, an ontology is just another named resource, a graph node. An URL is identified as an ontology in a statement like

<http://puls.cs.helsinki.fi/med> rdf:type owl:Ontology .
Ontology as possible world

Hintikka's possible world semantics extended propositions of classical logic with an implicit possible world (called in some later work context) index. Propositions were not true or false as such, but relative to a possible world, described by a model set. A statement that belongs to a model set is true of the possible world/s represented by the set. The role of a web ontology graph is quite analogous.

Intutively, an ontology too is a collection of statements. But which collection? That has remained somewhat implicit. An ontology is identified by an ontology resource URI ("ontology header"), and it is thought to consist of names of objects (resources and literals) and statements (triples) connecting them. But which triples officially belong to an ontology? The OWL standard does not explicitly say. The connection between an ontology URI and the associated resources and triples is normally not expressed in OWL. An ontology is just a resource among others. There are conventions based on physical document containment and URI sharing, but normally no explicit statement in OWL to the effect that a given triple belongs to a given ontology. The gist of the issue is that ontologies are identified by document, not as graphs, for the simple reason that RDF graph triples (unlike quads) have no context index.

The archetypal OWL RDF/XML ontology document format for simple ontologies was one where the ontology document URL matches the (only) ontology resource URI described in the ontology header, and the ontology URI is the URI prefix of all the resources "defined" in the ontology. But this is too simple to be true in real life. In the default setup, resources and statements are both "defined" in an ontology document and "belong to" that ontology. Triples in an ontology document are normally not marked in any way as being "in" one ontology rather than another What triples "belong" in an ontology, and how the namespaces of the triples are related to that of the ontology URI, is not regulated".

Even the association of an ontology "header" URI to an ontology document URL is not standardized. The OWL recommendations talk loosely about "the ontology header", but there is nothing to enforce existence or uniqueness of such a header in an ontology document. There is no semantic or syntactic connection between an ontology document, the ontology header, and the triples in the document, any more than there is one between the name of a file and its contents. Membership of a statement in an ontology is expressible explictly in RDF using reification , but it is not done in practice. Doing so with reification is quite inefficient, since four triples are needed to represent one.

RDF formats that add a fourth member to RDF triples for the "possible world" or context of the triple (making it a quad) have been provided for the purpose of SPARQL dataset i/o. under the name of NQuads and TriG . These formats go together with the notion of dataset containing named graphs, a set of graphs identified by name that can be explicitly queried by graph name. Datasets and named graphs extend RDF toward modal logic.

There is an RDFS utility property rdfs:isDefinedBy for pointing a resource to a vocabulary where it is defined. That is not the same thing as occurring to an ontology. The same statement may occur in many ontology documents. Import statements create an imports graph of the ontology resources, and as a side effect pool together triples from the ontology document associated to (accessible by) the ontology URI. Do (all of) these triples thereby belong to the importing ontology? Hard to say. Another property in the RDFS vocabulary that might relate a resource to an ontology it occurs in is rdfs:seeAlso . It has a more generic meaning in the RDF Schema .

5.4.1 rdfs:seeAlso

rdfs:seeAlso is an instance of rdf:Property that is used to indicate a resource that might provide additional information about the subject resource.

A triple of the form:

S rdfs:seeAlso O

states that the resource O may provide additional information about S. It may be possible to retrieve representations of O from the Web, but this is not required. When such representations may be retrieved, no constraints are placed on the format of those representations.

The rdfs:domain of rdfs:seeAlso is rdfs:Resource. The rdfs:range of rdfs:seeAlso is rdfs:Resource.

5.4.2 rdfs:isDefinedBy

rdfs:isDefinedBy is an instance of rdf:Property that is used to indicate a resource defining the subject resource. This property may be used to indicate an RDF vocabulary in which a resource is described.

A triple of the form:

S rdfs:isDefinedBy O

states that the resource O defines S. It may be possible to retrieve representations of O from the Web, but this is not required. When such representations may be retrieved, no constraints are placed on the format of those representations. rdfs:isDefinedBy is a subproperty of rdfs:seeAlso.

The rdfs:domain of rdfs:isDefinedBy is rdfs:Resource. The rdfs:range of rdfs:isDefinedBy is rdfs:Resource.

Having the same URI prefix as an ontology URI proves nothing. An ontology's triples may (and typically do) contain vocabulary sharing URI prefix with the ontology URL, but that is again conventional. An ontology document may quite well contain items that don't have the same base URI as the ontology (they typically do). There is no constraint between an ontology URI and the URIs of other resources in "it" (i.e. its defining document). In particular, nothing can be inferred from such conventions in RDF or OWL.

In OWL 1, the external URL of an imported ontology and the ontology header URI were related only indirectly. The standard says "An OWL document consists of optional ontology headers (generally at most one)". Protege 3 used to require that an imported document's xml:base attribute matched the imported URI. This convention was predicated on the default setup explained above. See http://www.w3.org/TR/owl-ref/#imports-def . This made it hard to locate imports. In OWL 2, imports are identified by URL or the document location, not by "internal" ontology URI.

OWL 2 adds a number of "should"s about how ontology graphs and documents are supposed to be related, but leaves things mostly as they were. See http://www.w3.org/TR/owl2-syntax/.

Since there is no necessary association between an ontology document URL and an ontology (resource) URI ("header"), many documents could contain and describe the same ontology resource (like any other resource), and one document could describe more than one ontology resource. It is just a convention related to Semantic Web addressing orthodoxy that an association between an ontology URI and an ontology document (accessible at that) URI exists and is one-to-one.

What happens if an ontology imports some URL, and that causes loading a document which does not contain an ontology header for the imported URI? Nothing specific happens in fact. If an imported ontology document has no (different) associated ontology resource description, importing it works like an include. If an ontology is too big to have in one file, it can be split into many in this way.

Ontology URLs are important for ontology imports. A statement of form

<http://puls.cs.helsinki.fi/med> owl:imports <http://puls.cs.helsinki.fi/> .

tells that the subject ontology imports the object ontology. Even if the purpose is just to split the ontology to manageable pieces, technically, the pieces become ontologies with their own URI. It may be safer to avoid using relative URI's in writing ontologies because they may cause trouble later if the ontology is split.

Resolving ontology URIs vs. URLs in ontology import statements can cause problems. An ontology loader goes and tries to locate a document describing the object ontology at the ontology URL. Failing that, it may ask for an association of the ontology URL with some file. Given a file, the loader looks for a RDF description about the ontology URL in it. If one is found, the file is loaded as a document describing the object ontology.

Jena resolves the object URI of an import statement to a physical URL against an external catalogue ont-policy.rdf . OWLAPI does not implement a catalogue mechanism, it is up to the user of the API to implement one. Protege 4 implements an XML catalogue mechanism. The current Pellet reasoner just fails if an imported URL does not point to a readable ontology.

Compare TF specific scheme prefix ibid. Further discussion:

http://protege.stanford.edu/doc/owl/owl-imports.html

http://www.nabble.com/Creating-local-repository-for-a-project-td24969842.html

http://protegewiki.stanford.edu/wiki/How_Owl_2.0_Imports_Work

http://answers.semanticweb.com/questions/1396/how-to-take-advantage-of-relative-uri-references-in-modern-owlrdf-editors.

In at least some versions of Protege the convention is that the xml:base attribute identifies "the" ontology defined in an ontology file. The idea behind this convention seems to be that the ontology resource uri should be the same as the xml:base prefix of the document containing it and that the shared prefix should also identify resources belonging to the ontology. A document imports the ontology that is mentioned in its xml:base element. The base attribute value must match the external URL of the ontology document, otherwise the ontology won't load. TF does not follow this convention. It prevents splitting an ontology into several files without changing the original namespaces. Things are supposed to change with Protege 4.1. In Protege 4.0 (build 113) import locations of Turtle (.ttl) files apparently are not cached, unlike imports of RDF/XML (.owl) files. Apparently in Protege 4, owl RDF/XML file locations are cached in per-directory XML catalogue files (catalogv00.xml), Turtle files are not.

The TF ontology has contended with the question of ontology imports from its beginning as the Tekes 4M project ontology. Already in that stage, term ontologies formed an inheritance hierarchy using the owl:imports primitive. In terms of the following figure, loading the company instance database on the top right corner would import the ontologies needed by it directly or indirectly. For example, a ship engine numbered #12345 could belong to a class xyz of engines described in a company ontology, which could define it as a given type of combustion engine ABC , describe in a more general ontology of diesels, and so on up.

TF Imports

Working experience with ontologies shows that large ontologies take a lot of space and time to process. The approach to ontology work taken in TF is "small models": instead of working with a large monolithic ontology, make it easy to extract and merge working subsets from larger ontologies.

Show/hide TF imports graph

TF imports
Bridge ontologies

For importing third party OWL ontologies to TF, one method is using hand-made bridge ontologies. Bridge ontologies are (preferably relatively small) ontologies which map between ontology vocabularies. They import all or part of the component ontologies plus define the contact points where the imported entities "plug in" to one another, by adding properties or concepts to relate them.

For instance, an excerpt of the YSO ontology is embedded into TermFactory using the bridge ontology YSO_bridge.owl . It embeds the root concept(s) of the expert under the appropriate node of the TermFactory ontology, and places any implied concepts imported from third party ontologies by the excerpt under node YSO_bridge in the TermFactory bridge namespace.

Importing a query
Importing a query

A selection of terms from one ontology can be imported to another using a TermFactory query url. A query may specify the collection of terms to import from an external ontology through some query language expression, saying something like "import from our India subsidiary all concepts subordinate to our concept SurplusTax plus their English and Hindi designations and definitions".

Import by query has some advantages over copy and paste. First, it saves manual work and errors. Second, since the query fetches the concepts by description, it also fetches the current version of the collection each time from the queried site. (It is of course also possible to query a given revision of a collection from a site that is under some kind of revision control.) Third, the imports may stay small and more manageable to work with.

Relayed queries

A relayed query is a query run on the query service of a different repository. Instead of fetching all the ontologies in the repositories and running the query locally, this type of query distributes the query over several sites and collects or merges the results from them.

Since queries can be expressed by URLs, nothing extra is needed in TF to implement while-you wait sort of relayed queries. It is just a query whose data set contains other query urls. No-wait relayed queries are not implemented yet, but seefederated SPARQL.

The following query uri uses TF uri tfget facility to fetch the ontology associated to an uri. (The query string looks hairy because of percent encoded characters. The encoding can be generated and parsed with TF, and anyway the ugliness can be glossed over with a clean url .)

http://localhost:8080/TermFactory/query?uri=%3chttp%3a%2f%2ftfs.cc%2f%ont%2fFinnish

The following query URL also specifies repositories to serve as the dataset of the query.

http://localhost:8080/TermFactory/query?uri=ont:Finnish&r=repo1+repo2

Here is a schematic example of using the result of one query in the dataset of another.

http://localhost:8080/TermFactory/query?query=QUERY&r=http%3a%2f%2ftfs.cc/TermFactory/query%253furl=DATA

What this URL does is use the local query engine to run QUERY on dataset DATA obtained from tfs.cc. The query string of the embedded query is URL encoded twice to avoid capture in the top level query.

Since TF query services are also identified by URL, a poor man's TF site does not even need to have a local query service, it can use a remote one to query local ontologies as well as remote repositories (provided it has permission to do so). A TF site can thus also be a virtual construction implemented by a cluster of nodes.

With address redirection, query imports can be hidden from view. A TF alias can map a resource URI to a TermFactory query before loading the import. The http server at the remote site may redirect its incoming URLs (which satisfy some further condition) to the TermFactory webapp, hiding the involvement of TF query services from view.

Show sample ontology containing imported queries

DESCRIBE queries

For TF, the notion of a terminology entry is not built in on the level of data structure. Facts about a given term can be sought arbitrarily far in the graph surrounding ("about") the term. One way to give content to this relative notion of terminology entry is through the query language. The TF DESCRIBE query facility (based on SPARQL DESCRIBE query type) is a way in TF to support various notions of terminology entry. The DESCRIBE query facility allows users to define a fixed but customizable notion of what should be included in a terminology entry.

As discussed above, alternative terminological orientations are possible, such as a concept-based view where an entry is identified by a concept and the terms related to that concept, or term-based view. where a term is selected as an anchor, or a lemma-based lexicographical view, where an expression serves as the key of the entry. The DESCRIBE query may or may not need to vary with the view, depending on the level of control on the content required.

After choosing the subset to display, there is the level of terminography or entry layout. Here, decisions are made about the order and grouping of the information returned by a given DESCRIBE query template. Some layout may be supported by the query language. For instance, SPARQL SELECT queries gives some control over the order or grouping the query results. Various RDF serializations have a few layout varieties. In TF, HTML layout can be controlled in detail using layout templates.

The SPARQL query language recommendation leaves the graph returned by a DESCRIBE query unspecified. The DESCRIBE form returns a single RDF graph containing RDF data about resources. This data is not prescribed by a SPARQL query, where the query client would need to know the structure of the RDF in the data source, but, instead, is determined by the SPARQL query processor.

The query pattern is used to create a result set. The DESCRIBE form takes each of the resources identified in a solution, together with any resources directly named by URI, and assembles a single RDF graph by taking a "description" from the target knowledge base. The description is determined by the query processor implementation and should provide a useful description of the resource, where "useful" is left to nature of the information in the data source.

If a data source has no information about a resource, no RDF triples are added to the result graph but the query does not fail.

The working group adopted DESCRIBE without reaching consensus. The objection was that the expectations around DESCRIBE are very different from CONSTRUCT and SELECT, and hence it should be specified in a separate query language. If you have input to this aspect of the SPARQL that the working group has not yet considered, please send a comment to public-rdf-dawg-comments@w3.org.

See also proposals for a generic DESCRIBE query definition .

TermFactory DESCRIBE queries have been implemented using Jena and Pellet in TF subdirectory io in svn. The TermFactory query engine TFQuery extends the Pellet engine to DESCRIBE queries. It can be run from command line as tfquery (or as pellet4tf query ) . Each DESCRIBE query defines some notion of a TF entry. The TF DESCRIBE query works as follows.

The TermFactory DESCRIBE query handler applies to each item in the result set of the DESCRIBE query some user-customisable CONSTRUCT query recursively to the depth given by option TF_DESCRIBE_DEPTH. This value can be changed per query from the pellet4tf command line with option --depth (-d), and in a QueryForm query string with query option depth (d).

The description depth option controls depth of recursion, not the depth of the result graph. If the query pattern of the DESCRIBE query has depth (maximum path length) larger than 1, description continues on the leaf nodes (the frontier) of the pattern. Using a complex description query is a way to fine tune the shape of the RDF entry rather like the HTML layout templates.

The PELLET query engine applies the PELLET reasoner so the result set can contain triples which are not asserted but just entailed by the query dataset. For example, the TF schema ontology does not contain the statement that English is the referent of the Chinese term for English, but the query engine infers it from the converse statement (that the term has English as its referent). On the downside, the reasoning is slow. It is more efficient to convert the ontology to entry normal form offline and run a SPARQL query on the normalised ontology at runtime.

The default queries to be applied to the resources to describe are set in the conf . Currently, pellet4tf allows specifying different DESCRIBE queries depending on whether one does a SPARQL query (ARQ without Pellet) or a PELLET query. The SPARQL query is defined by TF option TF_SPARQL_QUERY , by default etc/tfs.sparql . The PELLET query is defined by option TF_PELLET_QUERY , by default, etc/tfp.sparql . The canned DESCRIBE query can be changed per query from the pellet4tf command line with option --describe-query (-G) and in a QueryForm query string with query option describe-query (G).

The default query basically lists all the triples in the model that contain the resource/s to describe as subject or object. The filter clause FILTER ( ?invp != rdf:type ) is a SPARQL way of telling not to include type triples where the resource occurs as object (there would typically be too many of them to be of interest). The type filter can be used to block the inverses of other similar many-to-one properties by adding to the input data axioms that classify them into the meta class meta:TypeProperty . Subject field classification like ont:hasSubjectField is a good example.

The or the whole query rephrased, to generate more focused descriptions.

The Pellet query engine used in TF solves complex queries (containing triples with variables in both subject and object position) bottom up, by building large intermediate results that make such queries slow on bigger ontologies. One type of query that is bound to hang on a big ontology are open queries on the type relation (find all type triples, i.e. ontology realization ).

The hardcoded query logic built into the TF DESCRIBE query execution does things that are hard to say in sparql:

  • blank node closure
  • recursive description of named resources to user settable depth
  • taxonomy of a class in up and down direction to a given depth (currently disabled.)

The sparql DESCRIBE queries can then be kept simple, basically saying "list given properties for given subject". To get different results at different nodes, make the WHERE clause a union of alternatives,

The depth direction in TF DESCRIBE queries is controlled by DescribeSelector . The class defines a Jena selector that de/selects subjects or objects that are (not) to be described further. Review DescribeSelector rules if a DESCRIBE query result seems to miss something or include too much.

The de/select rules can be set with a configuration option TF_DESELECT whose default is home:/etc/sparql/deselect.ttl . A config file is a RDF document that consists of rules that describe statements using the RDF reification vocabulary. A rule prevents recursive description on the object of a triple it matches the triple and has rdf:object meta:describe "false", and licenses recursion if it has rdf:object meta:describe "true". Subject (rdf:subject) rules govern recursion on triple subject analogously. Specific rules (e.g. uri matches) override general rules (e.g. namespace matches). For instance, the first rule in the default configuration below prevents description of subjects or objects of triples whose predicate belongs to OWL reserved vocabulary. But this general filter is overridden by positive special cases that allow recursion to objects of RDF(S)/OWL set theoretic predicates (rdf:type, rdfs:subClassOf, equivalentClass, unionOf, complementOf, and intersectionOf). The rules in the default config below define default behavior of DescribeSelector (followed if there is no config file). Other ways to control query output are SPARQL query filters and HTML layout templates.

Show/hide TF describe selector config

@prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix owl2xml: <http://www.w3.org/2006/12/owl2-xml#> . @prefix meta: <http://tfs.cc/meta/> . # deselect.ttl # rules for TF describe selector # see TF manual [ rdf:predicate [ meta:ns "http://www.w3.org/2002/07/owl#"] ; rdf:object [ meta:describe "false" ]; rdf:subject [ meta:describe "false" ] ] . [ rdf:predicate [ meta:ns "http://www.w3.org/1999/02/22-rdf-syntax-ns#"] ; rdf:object [ meta:describe "false" ]; rdf:subject [ meta:describe "false" ] ] . [ rdf:predicate [ meta:ns "http://www.w3.org/2000/01/rdf-schema#"] ; rdf:object [ meta:describe "false" ]; rdf:subject [ meta:describe "false" ] ] . [ rdf:predicate [ meta:ns "http://tfs.cc/meta/"] ; rdf:object [ meta:describe "true" ]; rdf:subject [ meta:describe "false" ] ] . [ rdf:subject rdf:nil ; rdf:object [ meta:describe "true" ]; rdf:subject [ meta:describe "false" ] ] . [ rdf:predicate rdf:rest ; rdf:object [ meta:describe "true" ]; rdf:subject [ meta:describe "false" ] ] . [ rdf:object [ meta:ns "http://www.w3.org/2002/07/owl#" ; meta:describe "false" ] ] . [ rdf:object [ meta:ns "http://www.w3.org/1999/02/22-rdf-syntax-ns#" ; meta:describe "false" ] ] . [ rdf:predicate rdf:type ; rdf:object [ meta:describe "true" ] ]. [ rdf:predicate rdfs:subClassOf ; rdf:object [ meta:describe "true" ] ]. [ rdf:predicate owl:equivalentClass ; rdf:object [ meta:describe "true" ] ] . [ rdf:predicate owl:unionOf ; rdf:object [ meta:describe "true" ] ]. [ rdf:predicate owl:complementOf ; rdf:object [ meta:describe "true" ] ] . [ rdf:predicate owl:intersectionOf ; rdf:object [ meta:describe "true" ] ] . # unique closure [ rdf:subject [ meta:describe "false" ; owl:sameAs [ rdf:type owl:NamedIndividual ] ] ] . [ rdf:object [ meta:describe "false" ; owl:sameAs [ rdf:type owl:NamedIndividual ] ] ] . [ rdf:subject [ meta:describe "false" ] ; rdf:predicate [ rdf:type owl:FunctionalProperty ] ; rdf:subject [ rdf:type owl:NamedIndividual ] ] . [ rdf:subject [ meta:describe "false" ] ; rdf:predicate [ rdf:type owl:InverseFunctionalProperty ] ; rdf:object [ rdf:type owl:NamedIndividual ] ] . [ rdf:subject [ meta:describe "false" ] ; rdf:predicate [ rdf:type owl:InverseFunctionalProperty ] ; rdf:object [ rdf:type rdf:PlainLiteral ] ] . [ rdf:object [ meta:describe "false" ] ; rdf:predicate [ rdf:type owl:FunctionalProperty ] ; rdf:subject [ rdf:type owl:NamedIndividual ] ] . [ rdf:object [ meta:describe "false" ] ; rdf:predicate [ rdf:type owl:InverseFunctionalProperty ] ; rdf:object [ rdf:type owl:NamedIndividual ] ] . [ rdf:object [ meta:describe "false" ] ; rdf:predicate [ rdf:type owl:InverseFunctionalProperty ] ; rdf:object [ rdf:type rdf:PlainLiteral ] ] .

The rules for unique closure stop recursion on blank nodes that are unique by way of having a functional relation to a named resource or a literal. (Functional paths longer than one are not explored.)

Asked to describe a concept, the TF DESCRIBE query engine recursively queries the graph neighborhood of all resources in the result set with a canned CONSTRUCT query and applies the same procedure to new nodes in the result set. The default recursion depth is 2 for named resources. Blank nodes are not subject to the depth limitation. The canned query can be set separately for SPARQL and Pellet query engines. With factory settings, the canned DESCRIBE query collects all properties of a resource one deep. The result set can be modified in many ways:

describe more/other resource/s (-D, --describe, -U, --uri)
Example: add to query more resources at edges of the DESCRIBE graph pattern for increased depth
change repository (-r)
Add more instances or facts to describe or add axioms/schema to infer more facts
change bridge schema for endpoint query or Stacked engine (-B, --bridge, TF_BRIDGE)
These are special cases of the previous point.
change inference engine/s (-e, --engine, TF_ENGINE)
Inference engines typically grows the graph matched by a query pattern, but inference axioms can also provide for special treatment of special vocabularies.
change canned DESCRIBE queries (-G, --describe-query, TF_SPARQL_QUERY, TF_PELLET_QUERY)
The default DESCRIBE graph patterns treat all properties the same and recurse properties in both directions.
change describe selector rules (TF_DESELECT_URL)
The selector rules provide for special treatment of special vocabularies to curb up- or downward recursion.
change describe query depth (-d, --depth, TF_DESCRIBE_DEPTH)
Default depth 2 is sufficient for showing concepts, their terms and designations.
Use HTML layout to filter properties selectively (-R, root; -T, --template, --tree, ...)
HTML layout options allow fine-grained specification of properties shown per type of resource. Formatting through HTML thus provides a sophisticated query filter mechanism.

The pellet4tf command line application understands DESCRIBE queries in the following format. The resource to describe is given as the value of option -U. The resource can be given as a URI in angle brackets as shown below (the argument is in quotes to prevent parsing of the angle brackets by the shell). The query is against a default repository given in TF_REPOS .

pellet4tf query -U '<http://tfs.cc/ont/China>'

The resource name can also be given as prefixed name (aka qualified name) as above, provided the prefix is defined in TF_PREFIX_URL or in the query repositories.. Here is a query with its result in Turtle format:

pellet4tf query -U ont:China @prefix ont: <http://tfs.cc/ont/> . @prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . exp1:en-China-N exp:baseForm "China"^^xsd:string . exp1:en-China-N exp:catCode "N"^^xsd:string . exp1:en-China-N term:designationOf term:en-China-N_-_ont-China . exp1:en-China-N exp:langCode "en"^^xsd:string . ont:China a meta:Object . ont:China a ont:Concept . ont:China a ont:Country . ont:China a owl:Thing . ont:China a sem:Meaning . ont:China a sem:Place . ont:China a sem:Role . ont:China term:referentOf term:en-China-N_-_ont-China . term:en-China-N_-_ont-China term:hasDesignation exp1:en-China-N .

The query engine used in the selection stage and the engine used in the description stage of a DESCRIBE query should sometimes be different. To find all that is known about some topic, the best bet is to use a reasoner all the way. For editing purposes, it may make good sense to use a reasoner to select relevant items for editing to be sure nothing is missed, but then describe those resources for editing with a SPARQL reasoner. It only makes sense to edit asserted triples, because implied triples will not go away unless one edits the axioms that imply them. Editing is a syntactic operation which should be applied to a minimal nonredundant representation.

To tell the TF DESCRIBE query facility to use different engines for the selection and description stages, the query engine option can be of form ENGINE1+ENGINE2. For the editing scenario above, the option value should be SPARQL+MIXED. A single engine option value uses the same engine in both phases. The first engine in this composition is ignored if the query is not of DESCRIBE query. The Stacked engine has engine choices built in, so it cannot be composed with the other engines in this way. Compare but do not confuse with the Stacked engine option .

Formats

There are a variety of formats to represent a RDF graph as text. The normative syntax for RDF is RDF/XML , an XML document format for RDF triples. Unfortunately, XML makes RDF look more complicated than it is. Moreover, there is no canonical or normal form for RDF. Different converters may generate different (though equivalent) serialisations for the same RDF.

Show/hide RDF/XML

A relatively human-friendly syntax for RDF is Turtle . Turtle is a simplified, RDF-only subset of Tim Berner-Lee's Notation 3 . N3 has several features that go beyond a serialization for RDF models, such as support for RDF-based rules. Here is the result of a TF query about the concept of ISO country codes in Turtle format.

Show/hide Turtle

A RDF document may start with a list of namespace abbreviations (prefixes). The abbreviations have no global guarantee, but many conventional ones are in practice better known than the URIs they abbreviate. Turtle RDF triple terms are separated by whitespace, except quoted strings can contain whitespace. Minor punctuation can be used to fold together similar triples. Triples that only differ by object can be written by writing the common subject and predicate followed by a comma-separated list of the objects. Triples that share subject can be written by writing the common subject followed by a list of predicate-object pairs between semicolons. Each group ends in a full stop.

An overview of RDF and OWL file formats

A web ontology (RDF or OWL) is a many-dimensional graph that has no inherent orientation. It can be written to a linear character stream, or serialised, in many ways, and many ways have been proposed and implemented to date. Here is a table with implementation notes. Some layouts are essentially line-oriented, triple-per-line/element, others build some tree-like groupings. For triple-oriented notations, see Venn diagram here .

name extension description pattern comments
n-triple .n3 line oriented textual layout subject predicate object . special case of n3
TF3 .tf3 alphabetically sorted triple layout special case of turtle
Turtle .ttl tree oriented textual layout subject ( predicate ( object , )* ; )* . handles uri prefixes and rdf lists
Notation 3 .n3 a logic language over triples subject predicate object .
RDF/XML .rdf, .owl line oriented xml layout
RDF/XML-ABBREV .rdf, .owl tree oriented xml layout special case of RDF/XML
OWL XML .xml statement quad oriented layout for OWL 2.0 (statement-type predicate? subject object)

Besides Turtle and RDF/XML, OWL has a number of formats of its own. OWL 2 has a normative functional-style syntax . These OWL specific formats are not used in TF for now. Nothing prevents using them with third party tools that support them. There is an online converter between different OWL syntaxes athttp://owl.cs.manchester.ac.uk/converter/restful.jsp.

RDF/XML

A weakness of RDF/XML for multilingual work is that there is no provision for coding property names containing reserved characters. The only RDF/XML representation for property names is XML name QName ), which necessitates the use of character encoding for non-Latin property names. Plain uri-references in percent encoding cannot be used as such because percent is not allowed in QNames. (see discussion ). Our solution is to use TF3 encoding . A more general solution would be to extend RDF/XML with an <rdf:Property rdf:resource=URI> element analogous to <rdf:Description rdf:about=URI> for triples.

An XML name has the following syntax:

Names and Tokens [4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] [4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] [5] Name ::= NameStartChar (NameChar)* [6] Names ::= Name (#x20 Name)* [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*

The following nonnormative recommendations are also listed:

  • The first character of any name should have a Unicode property of ID_Start, or else be '_' #x5F.
  • Characters other than the first should have a Unicode property of ID_Continue, or be one of the characters listed in the table entitled "Characters for Natural Language Identifiers" in UAX #31, with the exception of "'" #x27 and "’" #x2019.
  • Characters in names should be expressed using Normalization Form C as defined in [UnicodeNormal].
  • Ideographic characters which have a canonical decomposition (including those in the ranges [#xF900-#xFAFF] and [#x2F800-#x2FFFD], with 12 exceptions) should not be used in names.
  • Characters which have a compatibility decomposition (those with a "compatibility formatting tag" in field 5 of the Unicode Character Database -- marked by field 5 beginning with a "<") should not be used in names. This suggestion does not apply to characters which despite their compatibility decompositions are in regular use in their scripts, for example #x0E33 THAI CHARACTER SARA AM or #x0EB3 LAO CHARACTER AM.
  • Combining characters meant for use with symbols only (including those in the ranges [#x20D0-#x20EF] and [#x1D165-#x1D1AD]) should not be used in names.
  • The interlinear annotation characters ([#xFFF9-#xFFFB]) should not be used in names.
  • Variation selector characters should not be used in names.
  • Names which are nonsensical, unpronounceable, hard to read, or easily confusable with other names should not be employed.

Compact URIs or CURIEs are a proposal for namespace abbreviation prefixes with URIs or IRIs whose local name part is is not a valid XML name. Sample CURIEs:

home:#start joseki: google:xforms+or+'xml+forms'

In OWL 2 concrete syntax, an IRI can be abbreviated as a CURIE. Adoption of CURIEs in Turtle has been discussed. It would be helpful for TF.

Below is an example of the TF entry for the concept of ISO country code in RDF/XML format.

Show/hide RDF/XML

The root element (tag) of the document is rdf:RDF, and the content is a list of rdf:Description elements, describing resources (rdf nodes). Ontology namespace prefixes are declared by the xmlns attributes. Namespace prefixes only help abbreviate uris. They make it easier to write and change uris textually should the namespace change. They are local (so they can change between rewrites) and have no fixed meaning (except for prefixes reserved by w3c, such as xml, rdf, rdfs, owl ...) The empty XML namespace prefix xmlns: defines bare XML element and attribute names.

The xml:base URI attribute sets the base URI for resolving relative RDF URI references. By default, the base URI is the URI of the document (its directory in the case of a file). The xml:base URI applies to all RDF/XML attributes that can have relative RDF URI references: rdf:about, rdf:resource, rdf:ID and rdf:datatype.

The node described by a rdf:Description is identified by an rdf:about, rdf:ID or rdf:nodeID attribute (if at all). A rdf:ID attribute on a node element can be used instead of rdf:about. It has the hash character built in, so that rdf:ID="name" is equivalent to rdf:about="#name". While there can be any number of descriptions rdf:about a given resource, rdf:ID is an XML ID, so it can only appear once in the scope of a given xml:base.

Blank nodes (anonymous nodes which have no resource URI) can be given an xml document local ID with attribute rdf:nodeID. This is needed when same blank node is referred to more than once. The node ID is local, meaning it may change between rewrites of the same document.

To recap: XML entity names can be used to abbreviate URIs in XML attribute values. XML namespace prefixes are used to abbreviate URIs in element names. xml:base attribute allows truncating absolute URIs into relative ones in its scope. These are all XML document related abbreviatory conventions to avoid writing URIs in full, nothing to do with the RDF graph or OWL ontology being described.

Here is a representative beginning of a typical OWL RDF/XML ontology document :

<?xml version="1.0"?> xml document declaration
<!DOCTYPE rdf:RDF [ <!ENTITY wn "http://www.ontologyportal.org/WordNet.owl#" > ... ]> rdf document type declaration, containing entity definition for prefix wn for use inside xml attribute values
<rdf:RDF xmlns="http://www.ontologyportal.org/WordNet.owl#" xmlns: defines the namespace for plain (unprefixed) xml elements
xmlns:wn="http://www.ontologyportal.org/WordNet.owl" xmlns:wn defines the namespace for xml element or attribute names prefixed with wn:
xml:base="http://www.ontologyportal.org/WordNet.owl" xml:base resolves (completes) relative URIs in its scope
>
<owl:Ontology rdf:about=""/> Ontology URI is xml:base (plus "" i.e. nothing). xml:base defaults to filename when not present.
<rdf:Description rdf:ID="WN30-101644373"> Description URI is xml:base plus "#" plus ID
<wn:word rdf:resource="#WN30Word-tree_frog"/> Property URI is xmlns:wn value plus "word", object URI is xml:base plus attribute value
<word rdf:resource="&wn;WN30Word-tree-frog"/> Property URI is xmlns: value plus "word", object URI is wn entity value plus attribute value
</rdf:Description>

TF ontologies typically contain resources from many different namespaces. Resources in them are identified by explicit prefixes or entity references instead of relative URIs. TF ontology URIs identify documents. The document/ontology URI is not related to the URIs of the resources defined inside the ontology. For instance, TF Schema ontology uri is http://tfs.cc/owl/TFS.owl, but no other resources in it have the same prefix.

RDF/XML can be abbreviated in various ways (see the standard ). A variant format called RDF/XML-ABBREV groups and nests triples sharing subject, predicate, or object nodes inside one another much the same way as Turtle does.

long form abbreviated comment
<some:property> <rdf:Description> ... </rdf:Description> </some:property> <some:property rdf:parseType="resource"> ... </some:property> description node abbreviated as parseType attribute
<some:property> <rdf:Description> <some:attribute>literal</some:attribute> </rdf:Description> </some:property> <some:property some:attribute="literal"/> blank node with literal abbreviated as attribute
<some:property> <rdf:Description rdf:about="&my;Item"/> </some:property> <some:property rdf:resource="&my;Item"> object node abbreviated as attribute
<rdf:Description> <rdf:type rdf:resource="&my;Class"> </rdf:Description> <some:Class/> type abbreviated as element name

Turtle

Many think that Turtle and N-Triples are superior replacements for the obsolete RDF/XML format. Turtle is the preferred format if you want to write a few hundred triples by hand, and N-Triples is used to publish large RDF data sets like DBpedia. N-Triples is a line-based, plain text format for representing the correct answers for parsing RDF/XML[RDFMS] test cases as part of the RDF Core working group. This format was designed to be a fixed subset of N3 and hence N3 tools can be used to read and process it. It is recommended, but not required, that N-Triples content is stored in files with an '.nt' suffix to distinguish them from N3.

In Turtle, (absolute or relative) URIs are surrounded by angle brackets. Directive @base can be used to resolve relative URIs against given base URI. By default, the base is the URI of the document. In addition namespace prefixes can be defined with directive @prefix . Turtle base and prefix work the same way as corresponding XML attributes explained below.

The Turtle format handles unabbreviated non-Latin IRIS fine. Character encoding problems can be best avoided by forgetting prefixes and keeping resource names as IRIs (e.g. <http://biocaster.nii.ac.jp/biocaster1#鳥インフルエンザ-N>). If many resource names share one common prefix, they can be abbreviated to relative URIs by setting the shared prefix as the @base URI of the document.

Turtle is more restrictive about qualified (prefixed) resource names. More precisely, Turtle qualified names allow XML qualified name ( QName ) characters except dot \u002e. TF3 encoding can be used for encoding non-ascii URIs in Turtle as prefixed resource names. The following rules list the characters allowed in Turtle names.

[30] nameStartChar ::= [A-Z] | "_" | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] [31] nameChar ::= nameStartChar | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040] [32] name ::= nameStartChar nameChar*

Blank node IDs can be written with prefix _: followed by a Turtle Name. Alternatively, a node can be written with a matching pair of left and right brackets in front or around its properties: [] :property :value or [ :property :value ] .

Literal strings are surrounded by single or triple double-quote characters (triple quotes include newlines inside them). The datatype of a literal is written after the literal linked to it with two carets (^^).

TF entries

TF entries

One difference between many traditional approaches to terminology and TF is the absence from TF of one fixed notion of a terminology entry as an information container . The semantic network metaphor for information management adopted in TF is dual to the container metaphor inherited to traditional terminology management from hierarchical databases and more recently, XML. The container metaphor comes from physical media like paper or magnetic tape. Containment among convex objects naturally forms tree structures. In a rooted directed tree, it makes sense of talk of nodes as bigger elements containing smaller elements. In an undirected tree or graph, all nodes are equal, any node can be taken as root or focus. Nodes do not contain one another, rather, they are visualised as dots connected by links. RDF and OWL graphs are not rooted, directed nor always connected. Information concerning a given node may be distributed freely in one or more documents or repositories.

Orientation is one distinguishing feature between the disciplines of terminology and lexicography. Terminology is concept oriented, lexicography is lemma oriented. Orientation has to do with the choice of origin and order of traversal of the ontology graph about it.

The TF HTML format allows switching between concept-oriented, lemma-oriented, and term-oriented layouts, as well as defining new ones.

Difference between terminology and lexicography

  • origin
    • concept: the country of China
    • lemma: the English noun China
    • term: China (meaning the country, not porcellain)
  • orientation

The three sample entries above were produced with the respective commands below. Each command loads two ontologies (-u) and converts into HTML a subgraph starting from a root (-R) filtered by a graph template (-T). This relies on the writer to select the relevant subgraph. With bigger ontologies, it may be more efficient to first extract a subgraph using a DESCRIBE query and only write that. The result may differ, depending on how the DESCRIBE query is set up.

pellet4tf query -u "http://tfs.cc/owl/tf-TFS.owl http://tfs.cc/owl/ctry/TFCtry.owl" -f HTML -T ont -R ont:China > ont-China.html pellet4tf query -u "http://tfs.cc/owl/tf-TFS.owl http://tfs.cc/owl/ctry/TFCtry.owl" -f HTML -T exp -R exp1:en-China-N >en-China.N.html pellet4tf query -u "http://tfs.cc/owl/tf-TFS.owl http://tfs.cc/owl/ctry/TFCtry.owl" -f HTML -T term -R term:en-China-N_-_ont-China > en-China.N_-_ont-China.html

TermFactory separates layout concerns from the real objects terminology is about, namely terminological resources: instances, classes, properties, and statements connecting them. The abstraction of terminological content from any fixed grouping or ordering of data leaves room for different mind sets or approaches to terminology work. Such approaches can be seen as different choices of granularity and orientation. Different choices may fit different tasks.

The Western standard normative approach to terminology founded by Eugen Wüster, codified in ISO terminology standards, aims at standardisation and harmonisation of international terminology. This starts with language independent concept analysis, where the expert community agrees on the referents of special language terminology, whatever they are called. In standardization, choices are made of standard expressions to designate the referents in each language. In harmonization, national differences are ironed out. The Western view of a well-behaved term comes quite close to the TF Term profile of the TF schema, depicted above .

Traditional descriptive lexicography distinguishes between expressions (lemmas) and terms (word senses). A traditional lexicographical entry is rooted in a lemma and enumerates the senses of the expression in some order usually based on partial similarity of meaning or usage of the senses. A bilingual dictionary entry is rooted in a target expression and lists target language expressions arranged in a similar way by shared or at least similar senses of the source and the target expressions.

Further approaches are possible. A term oriented approach (Kudashev) can wind out the graph starting from a term as root and enumerate other terms which the root bears selected (e.g. semantic) relations with, in the same language or in other languages. Such a contrastive layout can be particularly useful for translators.

This freedom from fixed structure offers both advantages and disadvantages considering format conversion. It can make conversion toward TF easier, because the target format is relatively free; no particular topology need be followed in converting to TF, so conversion need not change the source topology. On the other hand, it means that conversion from TF to more rigid formats can involve search (like queries from relational databases). A query language supported by a reasoner can be of much help here. For one thing, just because RDF or OWL serialisation to XML is not constrained, the resulting topology is rather unpredictable. This makes applying XML tools like XSLT to RDF/XML fragile. A standard serialisation of TF to some (more) fixed XML format, say TBX, is one solution to this. Another one is to divide and conquer the problem using TFS profiles .

TF formats

TF formats

TF terms can be distributed through the collaborative user interfaces, they can be queried directly from the repository system, and they can be converted to other existing formats and distributed through existing distribution channels (electronic, print, mobile etc.) Terminology exchange formats handled natively by TF include the LISA TBX terminology exchange format, the commercial SDL MultiTerm xml export format, and a customisable excel table format. Further formats can be added on demand.

A TF native format designed specifically for HTML or XML editing of RDF is the TF HTML format.

HTML

This is a HTML format for viewing and editing of TF entries on (X/HT)ML platforms. The format is not limited to TermFactory content, but it can be used to view and edit any RDF.

Internet Explorer versions 8 and earlier by Microsoft do not show XHTML. Microsoft added support for true XHTML in IE9. "HTML-compatible" HTML can be shown using the HTML media type (text/html) rather than the official Internet media type for XHTML (application/xhtml+xml). Version 38 of TF upgrades XHTML to HTML 5 and uses the HTML media type.

This format is implemented as a TF model to HTML writer/converter TF2HTMLWriter and the converse HTML to TF reader/converter HTML2TFReader. It represents TF models in the form of a sorted list of tree-structured concept-oriented entries. The visual rendering (skin) can be set using css and javascript. The HTML format is really just an alternative syntax notation for TF models. Like the other readers the HTML reader loads OWL imports if readAll is on. A TF HTML document can contain multiple lists of entries.

The HTML format document type is XHTML+RDFa where the TF model triples are coded as RDFa annotations. From this format, the TF model underlying the XHTML entry can be extracted using a standard RDFa distiller . RDFa abbreviates URIs using compact URIs or CURIEs .

The settings that control the output of the HTML writer are summarised here. A combination of settings constitutes a TF HTML style .

The many options of the HTML format give a lot of choices about displaying TF content as a webpage. Fortunately it is not necessary to specify options explicitly unless one wants something out of the ordinary. TF site wide defaults can be set in TF properties . User or task specific preferences can be set in a TF conf file and TF styles . TF styles can be named and stored in RDF or as Java properties.

For more convenience, the HTML writer writes the options it used in the HTML document head element as HTML meta tags. Correspondingly, the HTML reader looks into the header and uses any values it finds there, and writes the options it used as an entry layout resource with annotation properties corresponding to the HTML layout attributes into the model it constructs. Consequently, TF entries remember their HTML settings between rewrites.

Here is an example of a TF "entry" in HTML.

Show/hide HTML example

Here is an example of an HTML options header:

<head> <meta content="text/html" http-equiv="content-type"/> <meta content="TF2HTMLWriter" name="generator"/> <meta content="http://tfs.cc/TermFactory/etc/templates/sem.ttl" name="template"/> <meta content="http://tfs.cc/sem/Meaning" name="root"/> <meta content="http://tfs.cc/owl/wn/TFwn.owl" name="schema"/> <title>TF wn30:synset-entity-noun-1</title> <link href="/css/tf2xhtml.css" rel="stylesheet" title="server" type="text/css"/> </head>

Here is the corresponding RDF graph entry resource written into the model shown in TURTLE:

[] rdf:type meta:Entry ; meta:generator "TF2HTMLWriter" ; meta:root <http://tfs.cc/sem/Meaning> ; meta:active <http://tfs.cc/owl/wn30/wn30entry.php?e=cat-noun-1.ttl> . meta:schema <home:/owl/wn/TFwn.owl> ; meta:template <home:/etc/templates/sem.ttl> .

If there are more than one entry resource in an entry model, the reader chooses among them at random. The header is not included when an HTML entry is committed into an active ontology.

The TF HTML format keeps the content of text fields as preformatted text preserving whitespace, because whitespace is significant for equality of RDF text literals. Two RDF literals are equal iff their texts (literal forms), language tag (if any), and datatype URI are the same. If the preformatted text is too wide to fit, text is wrapped. The default stylesheet tf2xhtml.css provides an alternative setting (commmented out) that does not wrap but adds a scroll bar. To query text fields ignoring whitespace or case, one may use the SPARQL REGEX filter.

Show/hide text literal

text literal
HTML templates

The layout of the HTML document can be customised with a template written also in RDF. A serialisation of an RDF graph, such as a TURTLE file, constitutes a spanning forest of trees whose nodes are RDF nodes and arcs RDF properties. A TF HTML entry structure is a variant of this format. HTML entries are nested html lists of entries, properties and values. Different entry layouts come about by choosing the roots of the forest, the choice and order of properties, the order of property values, and the depth of nesting.

With template explicitly set to an empty model, the HTML writer produces a property tree isomorphic with TURTLE output, with triples sorted and grouped alphabetically by subject, predicate and object. The template library at etc/templates includes two simple templates wf and df for producing flat or nested displays of triples.

Show/hide flat/nested displays

Above is a flat listing of a path of blank nodes generated with wf. A nested listing generated with df is shown below.

The default TF layout is ont , which points to the concept layout etc/templates/ont.ttl . It chooses concepts as roots and nests properties down to the level of terms and expressions. Properties are sorted concept properties first, followed by definitions and terms, followed by the rest. This layout as well as alternatives to it can be defined using RDF format template files. Technically, template files are OWL ontology files and as such can import other such files. The template imported by ont.ttl is shown further below.

Show/hide top level of concept entry template

The above mentioned four options for entry layout: choice of root/s, choice and order of properties, and depth of nesting are controlled by the layout template.

The choice of roots decides what to count as an entry for the listing. It depends on the template and the schema. Roots are ignored if there is no template. Otherwise the following places are checked for roots in this order. The first place that checks is used, the rest are ignored. Roots can be given as values of an object property meta:hasRoot or as a string valued datatype property meta:root whose value is a whitespace separated list of named root classes or instances. A root class should be a subclass of a template root class. A root instance should be an instance of a root class.

  1. If an explicit --root (-R) option is given, it is used
  2. If the RDF document has a meta:Entry instance with property of form meta:root "...", it is used.
  3. If the template root class meta:Entry has a properties of form meta:hasRoot ... meta:root "..." or , they are used.
  4. If conf setting TF_ROOT has a value, it is used.

The given roots and resources typed by the schema as instances or subclasses of the given roots will be listed. If no roots are specified, or no items pass as roots by them, resources which have no incoming properties become roots. Listing starts from the alphabetically first roots. If thera are no roots, all subjects are included starting from alphabetically first subjects.

All resources belong to class owl:Thing, so making owl:Thing a root class includes all instances. Class meta:Entry is disjoint from class meta:Object, so setting root to meta:Object excludes instances of meta:Entry.

Only schema reasoning is used to determine entries at write time, because reasoners tend to be slow with type realization on large instance bases. Schema reasoning is faster but not complete. There may remain instances in the data that are not listed as entries, though they could be proved to be members of the entry root class from the model and schema together, using type realization and enough time.

The choice of properties and the level of nesting is controlled by matching the input model with the structure of the template. A property of a resource is included into the listing if it matches a property of the current template node. If there is no schema, model and template properties are matched by identity. If there is a schema, there is a match if the model property is a subproperty of a template proprty by the schema. The first match is taken, there is no backtracking.

The reserved wildcard property meta:property matches all properties not otherwise matched. The wildcard object owl:Nothing matches nothing, so a a matching triple with this object excludes a property from the listing. The wildcard subject owl:Thing matched any node. It is tried last if there is no other match.

The boolean property triple meta:Entry meta:invent "true" . in a template model tells the writer to invent inverse property names for properties missing inverses, by suffixing -InverseObjectProperty to any such property name. This makes sure those resources which only appear as objects of triples are included into the listing.

The order of properties is based on the reserved template sort property meta:cmp . By default, properties are sorted alphabetically by URI. A triple like rdf:type meta:cmp "1" . tells the HTML writer to use the string "1" to sort property rdf:type by instead of the URI. Since numbers are sorted before letters, this causes rdf:type to get sorted first. NOTE! meta:cmp values sort as strings, not numbers, so 100 sorts before 9.

We may want to sort property values by content. For instance, terms should be sorted by language code. This can be accomplished by a template like the following.

term:Term meta:cmp _:lang1 ; term:hasDesignation [ exp:langCode _:lang1 ] ;

This template tells that terms are sorted by the language code of the designation, for the value of the sort property equals the langCode property of the designation of the term.

A node template can be given property meta:sortBy to control how properties of a node matching the template are sorted in printing. Currently, three values are implemented. The default value "property" sorts triples by property, as explained above. Value "value" sorts properties first by value, then by property. All properties whose values have a sort template as explained above are sorted first, and ties are sorted by property. Value "mixed" makes use of "hierarchical" sort properties whose value have a "decimal" dot as in

rdf:type meta:cmp "1" . term:contentOf meta:cmp "2.1" . term:referentOf meta:cmp "2.2" . term:hasExplanation meta:cmp "3" .

In the mixed sort mode, properties are first sorted using just the first half ot the sort property value before the decimal dot (if any). In the above example, "1" gets sorted before "2.1" and "2.2", but the latter two are tied. After that, the sort goes on as in the previous case: first by value, and finally by property again (this time using the entire sort property string as usual.) For instance, the above sort property settings cause type properties to get sorted first, after that, definitions and terms are first grouped by language code, and within each group, definitions are sorted before terms. This sort order comes close to TBX term entry structure where terms are sorted into language sets.

For more control, blank leaf nodes inside anonymous templates can be used as bindable variables to control printing of nodes in the model graph. Blank leaf nodes in an anonymous template get bound to corresponding nodes in the model graph during the printing of an entry. A node is only printed if it satisfies the bindings of the template instance created during the traversal. Thus for instance, in the sample lexicographic template exp0.ttl , definition texts get printed out according to the language of the expression, because the _:lang node gets bound to the language of the expression when exp:Expression is traversed, and this binding is checked from inside _:Text at the time _:Definition is entered.

For maximum versatility, reasoning can be used to alter the topology of the model. For instance, if one wants to associate terms in a given language directly to definitions in the same language, one can add to the input model some property connecting them. For instance, sign:synonymWith can be used to link terms directly to definitions). New intervening node types to group items can also be added without changing the xhtml writer, for instance, to implement the TBX auxiliary notion of language set.

Other than the cases described above, template resource names are not matched against resource names in the model. (They might, but so far such fine tuning has not been called for.)

When there is a schema file specified, (the only or a random) rdf:type property of resource meta:Entry in the template is used as the superclass to select entries from the input model. Entries that are explicitly of the given class or whose explicit types can be inferred to belong using the given schema it are included. Type reasoning is not applied to the input model, as this can take very long with large input files.

The template to write HTML entries with can be specified explicitly. The default value is etc/templates/ont.ttl , the model template for TF concept based entries. The default value can be changed in the conf . Template files can include other template files using the owl:imports property. (For examples, see etc/templates.) The command line option root can also be used for specifying the output root filter from command line.

The TF HTML format allows choosing between concept-oriented, term-oriented and lemma-oriented layouts using the template option with values in ont|term|exp . The default is a concept-oriented entry. For examples, see the section on entries . In order to implement some desired layout of the data, the topology of the model to write may need to be enriched (say, using a reasoner) so as to create more connectivity into the graph. For instance, in Legacy terminology, a term and a definition are connected through a shared concept. To finish this section, here is the default template for showing concept entries.

Show/hide body of concept entry template
HTML skin

For decorating the HTML layout, there is a template property meta:css that can be used to add a CSS class attribute to a property or value. For instance, the triple rdf:type meta:css "sem" . marks the type property as a semantic property in the HTML. By the default stylesheet tf2xhtml.css , it gets color blue.

HTML option skin tells the HTML writer which CSS stylesheet and/or javascript link to write in the document header. If the value of the option ends in .css or .jss , the appropriate link is written. Otherwise, both types of links are written with the corresponding filetype added. If the value is not an absolute url, it is written as such and relativized to the TermFactory root. The default skin is tf2xhtml .

The HTML writer links the HTML document to a CSS script named tf2xhtml.css . The link points to the TF server root location /TermFactory/css/tf2xhtml.css (The writer also adds an alternate stylesheet link titled "local" that points to the location of the HTML document itself, to use as a fallback when server root is not accessible.) This css file can be customised at will. The output template provides a meta property meta.css that let the template instruct the writer to provide any RDF property element it writes out with one or more CSS class attributes. Then it is easy to associate desired CSS decorations to those class attributes. Advanced CSS selectors are able to associate decorations directly to RDF properties mentioned in the HTML, but going via HTML class attributes declared with meta.css can be useful as an intermediate level of abstraction.

The default stylesheet tf2xhtml.css has been intentionally left bland, so that the relationship of the HTML to the underlying RDF model is not obscured. Color coding is used to set off different property types. String literals (the only text that editor users normally need to type in) are in black. Entries are in the default color maroon. Concepts (meanings) are shown in blue, terms (signs) in green and expressions (forms) in dark yellow (orange). Entries, terms, and expressions are boxed in the corresponding color. Lighter shades of green and yellow indicate definition and text fields. Editable content is strong (boldface), read-only content watered out (transparent) italics. As a demonstration of the many things one can do with CSS alone, the sample stylesheet hides from view the term and expression URIs inside a given box until one hovers on the top of the box. CSS 'skins' are conservative in that they cannot reorder content, so it is usually still possible to edit the HTML through the skin. CKEditor normally applies source CSS when in WYSIWYG mode, so provided CKEditor succeeds to find tf2xhtml.css , the WYSIWYG mode will show the styles.

For yet fancier control on the looks, the writer links the HTML document to a javascript source named tf2xhtml.js at location /TermFactory/js/tf2xhtml.js . The default script activates the property list bullets in the HTML so that ctrl-clicking on any one collapses the box under it (hides the property list from view). Ctrl-clicking on the bullet again expands the box again. Again, the javascript can be customised at will. CKEditor does not normally load javascript associated to a source. In the TF Editor, box collapsing/expanding in a document inside the editor is enabled with a custom js script /TermFactory/js/custom_config.js . With it, boxes in a TF2HTML document inside the editor text can be collapsed and expanded by ctrl-clicking on the bullet.

HTML tree

A RDF model is an unordered directed graph. The graph need not be connected, and the arcs (properties) and paths (property chains) can be reflexive and form loops and re-entrancies. The HTML layout serialises the graph as a forest of trees where re-entrancies and cycles (back links) are indicated by coindexing, i.e. a node IRI or blank ID can occur more than once in the layout.

When a RDF model is edited, back links may cause repetition of data which complicates editing. In particular, a triple and its inverse may occur in the same layout, so editing or deleting one should presumably also affect the other. There is an HTML layout option --tree which lays out a forest of trees accessible from the selected root/s, suppressing back links so that no resource (except properties) occurs in the graph more than once. (Among other back links, reflexive properties are suppressed from this layout.)

XSL skin for tabular layout

Often in practical applications, some set of entries instantiate a common repetitive graph template, with just some slots open for editing. TF users can transform the HTML output into a HTML table using a XML stylesheet tf2html2.xsl that forms part of the TF skin called tf2html2 . The table stylesheet preserves the same structure as the graph (unordered list) representation, but arranges on each level value nodes on rows and properties in columns, proceeding recursively into embedded tables as necessary. Tabular html is parsed by applying a reverse stylesheet html2tf2.xsl that transforms the grid back to an unordered list. The stylesheets can also be applied manually from commandline with an xslt processor.

Here is a sample xslt command line. NOTE: The HTML5 doctype declaration <!DOCTYPE html SYSTEM "about:legacy-compat"> is not accepted by saxonb-xslt as such, but a dummy protocol handler can be added, see http://stackoverflow.com/questions/6917514/how-to-use-the-about-protocol-of-html5-in-xslt-processors .

saxonb-xslt -xsl:$TF_HOME/etc/skins/html2tf2.xsl -s:kkmax1.html

The default TF tabular skin tf2xhtml2 displays a TF grid. It provides a simple format for editing term equivalents, for instance.

Below is a TF Form ontology and its HTML grid display. Only information subject to editing is displayed.

Show/hide a TF Form document Show/hide a TF grid

Since the grid is a html table, CKEditor table controls should work normally. In addition, TF specific key bindings may be defined in the TF CKeditor's custom-config.js:

  • An entry at the cursor can be copied to the end of a grid with CTRL-SHIFT-N.
HTML entry schema

In absence of a HTML schema, the HTML writer follows the given template literally. Properties in the template are matched literally with those in the model to write. If an explicit schema ontology is given, matching is subject to the schema. For instance, WordNet entries can be written with the generic meaning based sign schema etc/sem.ttl as template option if one uses the TF WordNet schema owl/wn/TFwn.owl as the schema option:

factor template=etc/sem.ttl schema=home:/owl/wn/TFwn.owl io/entity.ttl format=HTML > io/entity.html

Show/hide Wordnet example

The schema option thus allows the same template to match a variety of different ontology formats. It is enough for the schema to subsume properties used in the model under the properties mentioned in the template. (For efficiency, Some TF Schema inverse relationships have been hardcoded into the writer, such as term:hasReferent/term:referentOf, term:definitionOf/term:hasDefinition . These settings cannot be overridden by a supplied schema option.)

HTML active ontology

The TF query facility and the editor are designed to allow editing just a part of a larger model. Separating the edited model (the subset returned by a query) from the active model makes it possible to edit selected parts of a large ontology without dragging all of it to the editor.

A terminology query can produce content from several different ontologies, also from different repositories through relays. The information obtained from the different sources may be crucial for understanding what changes are needed. At the same time, edits should only be done in the ontology currenty under active development, one to which the current editing team has edit permissions. Some solution like the Protege facility of choosing the active ontology among the ontologies shown (greying the rest) is indicated. OWL 2 supports per-statement source annotations using a variant of the RDF reification technique. It takes four more triples to annotate a triple in the ontology. Pending a more efficient technique for source indication (Ontotext has one), it seems easier to do a second query for editable content from the active ontology and use that to single out the editable elements at the edit interface level.

For this purpose, the TF jena model HTML writer can be given the active model as an option. Given this option, the writer marks those triples that come from the active model with attribute class="editable" and those that do not with class="readonly" . An editing tool can then restrict editing to the active ontology, so that a subsequent save of the edits can be included in the right ontology. (Compare the forthcoming HTML 5 contenteditable attribute .)

HTML original and edits

The original and edits options are used to print differences between original and edited ontology. Entries and triples missing from the original are tagged with css class "deleted" and shown by the default stylesheet tf2xhtml.css stricken out. Entries and triples missing from the edits are tagged with css class "added" and shown underlined. For an example, see section on TF revision control .

HTML localization

The TF RDF to HTML writer takes as further options a localization vocabulary and language code. It then prints out localized labels for property names and values whenever it can find terms in the localization vocabulary tagged with the requisite language code. Conversely, the HTML to TF parser accepts the corresponding options and uses them to map the localized labels back to TF entity URIs.

The TF RDF to HTML writer and its converse, the TF HTML to RDF parser are parametrised with a localization model and lang code. Given these options, a TF model is written in HTML with property and value URIs labeled with strings taken from the localization model and language. Conversely, when an edited HTML document is parsed back into TF, labels used in the HTML document are mapped back to TF URIs by looking up corresponding localization terms from the localization model.

A location for localization files can be specified with TF option lion or as style option TF_LION in the conf. The default is home:/owl/all0-TFS.ttl .

HTML links

The TF to HTML writer adds hyperlinks to resources mentioned in the document. The target of the link is by default the URI of the resource. The hyperlink target can be remapped with a location mapping file. The default is etc/maps/lynx.n3. It can be changed with option TF_LYNX in the conf or with option lynx . Link mappings are TF location mappings so they can be kept in the MAPS collection. The link mapper applies mappings until it runs out of applicable mappings or TF_HOPS is exceeded. Below is an example of a link mapping that maps Wordnet URIs to their TermFactory mediawiki editor pages.

[] tf:mapping [ tf:prefix "http://purl.org/vocabularies/princeton/wn30/" ; tf:altPrefix "http://localhost/mediawiki-1.20.2/index.php/Special:TFTab/Wn30:" ] .
HTML headers

The HTML writer writes out the options it used when doing the write in HTML header elements. Here is a sample header element.

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet href="tf2xhtml.css" title="local" type="text/css"?> <?xml-stylesheet href="/TermFactory/css/tf2xhtml.css" title="server" type="text/css"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:exp="http://tfs.cc/exp/" xmlns:meta="http://tfs.cc/meta/" xmlns:ont="http://tfs.cc/ont/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:owl2xml="http://www.w3.org/2006/12/owl2-xml#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sem="http://tfs.cc/sem/" xmlns:sign="http://tfs.cc/sign/" xmlns:syn="http://tfs.cc/syn/" xmlns:term="http://tfs.cc/term/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <head> <meta content="text/html" http-equiv="content-type"/> <meta content="TF2HTMLWriter" name="generator"/> <meta content="home:/etc/templates/ont.ttl" name="template"/> <meta content="home:/etc/maps/lynx.n3" name="lynx"/> <title>TF ont:ctryCode</title> <link href="tf2xhtml.css" rel="alternative stylesheet" title="local" type="text/css"/> <link href="/TermFactory/css/tf2xhtml.css" rel="stylesheet" title="server" type="text/css"/> <script src="/TermFactory/js/tf2xhtml.js" type="text/javascript"> </script> </head>

When an HTML meta entry is converted into RDF, the HTML meta headers are converted into a blank node of type meta:Entry with matching TF meta namespace triples. Here are Turtle triples corresponding to the above header.

[] rdf:type meta:Entry ; meta:generator "TF2HTMLWriter" ; meta:lynx %lt;home:/etc/maps/lynx.n3> ; meta:template %lt;home:/etc/templates/ont.ttl> .

The HTML writer writing a model back to HTML again looks for these values for defaults. As a result, an HTML entry can be roundtripped through RDF without having to bother about the settings. (This among other things allows caching HTML entries in a RDF database.)

TF3

The TF3 (TF triple) normal form of a TF ontology is a variant of TURTLE that aims to minimise free variation. It lists triples sorted in alphabetic order without prefixes.

When ontologies are brought to TF3, generic text based diff tools and visualisations can show version differences.

TF adds to Jena's triple reader/writer library a pair of TF specific triple reader/writers with symbolic name TF3. The TF3 format is a special case of Turtle format where statements are written one per line and sorted alphabetically. Blanks are named heuristically by first occurrence. TF3 format should be a special case of TURTLE and as such readable as a rdf triple file by any Turtle reader. For details see here .

Show/hide TF3 example

TF formats 2
More TF formats

JSON

JSON (an acronym for JavaScript Object Notation) is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is language-independent, with parsers available for most scripting languages. The JSON format was originally specified by Douglas Crockford, and is described in RFC 4627. The official Internet media type for JSON is application/json. The JSON filename extension is .json.

The JSON format is used for sending data over a network connection. It is primarily used to transmit data between a server and web application or user agent, as a popular alternative to XML. TF writes RDF in JSONLD format and reads it with suffix .jsonld. JSON-LD is a W3C recommendation. SPARQL SELECT query result sets are formatted in JSON using the Jena JSON serialization.

There have been several unofficial proposals for representing RDF in JSON . One JSON format for RDF is the Exhibit JSON format from MIT Simile project. There is an online converter babelhereorhere. JSON-LD has a media type application/ld+json ’ and a preferred suffix .jsonld. .

TSV

A tab-separated values or TSV file is a simple text format for a database table. TSV is a layout for results from SPARQL SELECT queries .

Lines correspond to rows in the table. Fields are separated with tabs. Here is an example of a TSV format listing of ontologies maintained by TF site ontology policy.

The listing was generated with query http://localhost/TermFactory/query?q=home:/etc/scripts/select-tfs-ontologies-from-ont-policy.sparql&f2=TSV&z=.tsv . This is a good TermFactory listing as such. The first line is a header for the column name. It is skipped by the TF listings reader based on the initial question mark.

The TF query facilities have an option --global (-g) that governs printing of resource names. The default is to print short (prefix) names when available. The glob option prints full resource IRIs.

TBX

TBX (Termbase Exchange format) is a localization industry ( LISA ) standard for the interchange of terminology data including detailed lexical information.

TBX has been republished as ISO standard 30042. The framework for TBX is provided by three ISO standards: ISO 12620, ISO 12200 and ISO 16642. ISO 12620 provides an inventory of well-defined “data categories” with standardized names that function as data element types or as predefined values. ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX. ISO 16642 (also known as Terminological Markup Framework) includes a structural metamodel for Terminology Markup Languages in general.

There is an online terminology ISO data category registry at http://www.isocat.org/interface/index.html or http://www.isocat.org/files/12620.html . An older listinghere. The contents of the ISO registry are a mixed bag at present, but part of its terminological vocabulary has been be adopted to TFSchema.

TBX is designed to support the analysis, representation, dissemination, and exchange of information from terminological databases (termbases). It is intended to qualify as a TML (Terminology Markup Language) as defined in the Terminology Markup Framework (TMF) specified in ISO 16642:2003. In addition, TBX is intended to support the extraction and merging of information from other, non-TMF-compliant, formats,although these processes may involve some information loss.

TMF (ISO 16642) is an abstract data model to describe a potentially infinite set of Terminological Markup Languages (TML), that can be expressed for the interchange of computerized terminological data using, for example, XML. TMF does not describe one specific format, but acts as a kind of meta-model based on the following elementary notions:

  • The meta-model: a unique information structure shared by all TMLs and which decomposes the organization of a terminological database into basic components as shown in figure 1. This model is in keeping with the traditional concept-oriented view of a terminological entry dating back to Wüster’s early works [Picht & Schmitz, 2001] and widely adopted in the community;
  • Information units (which we refer to as data categories): derived as a subset of a Data Category Registry (DCR, see below) as needed for a given format. This may also contain additional data categories specifically defined for the current application, which may hinder interoperability with other formats;
  • Methods and representations: the means to actually implement the TML by instantiating the structural skeleton in combination with the chosen data categories, for instance by automatically generating an XML schema for the TML. This comprises the mappings between data categories and the vocabularies used to express them (e.g. as an XML element or a database field).

The TMF metamodel defines a hierarchical entry as shown in the following figure. Its top level describes a concept, the middle tier groups terms by language, and the lower levels describe terms and their parts.

Show/hide TMF metamodel

TMF metamodel

Although TMF abstracts away from XML concrete syntax, it sticks to the notion of a tree structured entry. The structural elements do not provide any specific information from a terminological perspective, but rather contribute to the organization of the terminological entry. Specifically, structural nodes may serve two purposes: structure sharing (entries can share triples through a shared node) and property inheritance (properties such as source indications and other annotations can be inherited from a structural node). Tree is a special case of graph, so the structure sharing function of stuctural nodes can be implemented in TF too. Property inheritance can be expressed in TF also explicitly using OWL axioms and rules.

TermFactory supports two way conversions between TF ontology format and the Term Base Exchange (TBX) standard format. TBX can thus be used as an exchange format through which ontology-to-terminology-to-ontology conversions happen.

TF2TBXWriter.java is a TF to TBX converter written in Java using Jena. There are some awkward spots in the conversion where the TBX data categories fail to match ontology language semantics. In particular, traditional terminology theory and TBX have less place for the first order logic distinction between individuals and classes than TF. Setting aside such differences, the TF to TBX conversion and the TBX to TF conversion are true inverses, meaning that after a round trip, the TF3 form of an ontology file remains fixed in the conversion. (Since the TF to TBX conversion is lossy, the initial round may filter out elements that have no conversion mappings and add implicit information.)

Show/hide TBX example

NQuads

The NQuads format extends N-Triples with context. Each triple in an N-Quads document can have an optional context value:

<subject> <predicate> <object> <context> .

The provenance (source or context) of a triple is essential when integrating data from different sources or on the Web. Therefore, state-of-the-art RDF repositories store subject-predicate-object-context quadruples, where the context typically denotes the provenance of a given statement. The SPARQL query language can query RDF datasets. The context element is also sometimes used to track a dimension such as time or geographic location.

Applications of N-Quads include: exchange of RDF datasets between RDF repositories, where the fourth element is the URI of the graph that contains each statement; exchange of collections of RDF documents, where the fourth element is the HTTP URI from which the document was originally retrieved; and publishing of complex RDF knowledge bases, where the original provenance of each statement has to be kept intact.

TriG

TriG is an extension of Turtle to support representing a complete RDF Dataset. A TriG document allows writing down an RDF Dataset in a compact textual form. It consists of a sequence of directives, triple statements, graph statements which contain triple-generating statements and optional blank lines. Comments may be given after a # that is not part of another lexical token and continue to the end of the line.

TriG graph statements are a pair of an IRI or blank node label and a group of triple statements surrounded by {}. The IRI or blank node label of the graph statement may be used in another graph statement which implies taking the union of the tripes generated by each graph statement. An IRI or blank node label used as a graph label may also reoccur as part of any triple statement. Optionally a graph statement may not not be labeled with an IRI. Such a graph statement corresponds to the Default Graph of an RDF Dataset.

LMF

ISO 24613:2008, Language resource management - Lexical markup framework (LMF), is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication and cultural diversity.

LMF appears to follow the same Saussurean model as TF, except that a LMF lexical entry allows many senses. This is the lexicographical notion of a lemma-keyed lexical entry.

MultiTerm

SDL (previously Trados) MultiTerm is a commercial terminology management tool that has been on the market since the 90's. It is a searchable database which allows creating hierarchically organized multi/mono-lingual term entries with user-definable fields. It is sold standalone and as a part of the SDL (previously Trados) computer aided translation package.

SDL MultiTerm has an XML export format. A schema document for the formatMTF-schema.xmlcould be found in the Docs subdirectory of TRADOS Freelance Edition 7. The schema document, written in Microsoft's obsolete XDR schema, was converted using Microsoft's XDR to XSD converter into XML Schema language. An XML schemaMTF-schema.xsdfor the format is included in the TF conversion library.

Conversion

TBX - TF conversion

There is a TF converter of TF ontologies into TBX Basic term interchange XML document format and a XSLT script to convert TBX into TF. a native TF reader based on the XSLT script is planned.

(version 0.0) The convertertbx2owl.xslis an XSL(T) 1.0 script which transforms a TBX xml document into an OWL RDF/XML document.

TBX Basic is a commonly used simplified subset of TBX.

An XSL 2.0 script tbx2tfs.xsl converts TBX Basic documents to TF. The converter comes with a separate rdf/xml document $TF_HOME/etc/tbx/tbx-mapping.rdf which spells out the element by element correspondences between TBX data categories and TFS.

The mapping file format is RDF. The mapping file consists of correspondence rules like

<rdf:Description> <tbx rdf:parseType="Resource"> <name>admin</name> <type>versionInfo</type> <text>&DUMMY1;</text> </tbx> <tfs> <owl:Thing> <owl:versionInfo>&DUMMY1;</owl:versionInfo> </owl:Thing> </tfs> </rdf:Description> <rdf:Description> <tbx rdf:parseType="Resource"> <name>descrip</name> <type>subjectField</type> <text>&DUMMY1;</text> </tbx> <tfs> <owl:Thing> <ont:hasSubjectField rdf:resource="&DUMMY1;" /> </owl:Thing> </tfs> </rdf:Description>

The tbx node of a rule describes the TBX element in terms of the element name, type and content. The tfs node describes the TF side. The top node defines the subject of the target graph. The graph under it gives the properties that in TF correspond to the TBX element. &DUMMY1; is a variable indicating content shared by both sides. other is a dummy for "other" tbx types. A match is exact if context, name, type and content all match, and a partial match if context and name match. A partial match is more exact if type matches. The logic followed by the converters TF2TBXWriter.java and tbx2tf.xsl in matching such correspondences is as follows.

  1. A given input matches a correspondence if it is an exact match to the source element.
  2. A partial match matches input except for &DUMMY1; and other .
  3. An exact match is preferred over partial matches, a more exact match is preferred over a less exact match.
  4. The first match wins if there are many.
  5. A target element is written only if context matches.
  6. If there is no target element nothing is written.
  7. Text matching &DUMMY1; in the source is substituted for &DUMMY1; in the target.

This logic supports various strategies:

  • In order to change some values and keep the rest, write exact match rules and a dummy to dummy general rule.
  • In order to keep some values and suppress the rest, write exact rules and a dummy empty target rule.
  • In order to pass values except some, write a dummy general rule and exact match empty target rules.
  • In order to suppress values except some, write a dummy empty target rule and exact match rules.
  • In order to convert values (except some) to a default, write a dummy to default rule (plus some exact rules).

The current version of the converter only handles one shared variable.

TF to TBX conversion

The TFS to TBX converter is a Jena writer TF2TBXWriter.java embedded in the TF factor utility . It can be run with a command line like

factor ctryCode.owl TBX > ctryCode.xml

The converter uses the same rdf document (by default, etc/tbx-mapping.rdf ) as the TBX to TF converter for finding conversion correspondences.

With factor command line switch level=TRACE , the converter prints information about mapping rule matches:

factor -F Place.owl -f=TBX --level=TRACE ... </termEntry> <termEntry id="http://tfs.cc/term/hasDefinition"> <descrip type="instanceOf">owl:ObjectProperty</descrip> <!-- *** No rule: *** http://www.w3.org/2002/07/owl# rdfs:subPropertyOf term:hasDescription --> ...

The conversion mapping between TF and TBX can be extended using bridge ontologies and a reasoner. Bridge ontologies define correspondences between TF properties and TBX data categories. Using a reasoner, properties in the TF ontology which are not covered by rules in the mapping file can be rephrased into properties which are covered there. For instance, it may be enough to define a general mapping rule for the class meta:Description and let the reasoner and bridge ontology entail the mapping from a variety of different types of description to that common class.

The general insight from our conversion efforts is that syntactic rewriting in terminology format conversion can be significantly simplified by using TF semantic conversion (RDF or OWL entailment with bridge axioms) as an intermediary. Also the conversion pipeline becomes more transparent because the TF internal conversion steps have a clear semantics.

Show/hide TF conversion

TF conversion

TF to cparse conversion

A TF terminology can be converted into multilingual lexicons for the constrained language parser/generator cparse . (version 0.0) The converter (Terminator2) is implemented in Java using the Jena RDF/OWL library.

Ontology bridging

The mapping of a third party ontology to TF can take the form of an OWL ontology that (recursively) imports sparql CONSTRUCT queries on one or more other ontologies. This type of conversion document gets automatically updated whenever the imported ontologies change. Although a conversion may be too complicated to do with one sparql query, by query imports it is possible to join results of several queries in parallel or in series so that the final composite outcome is the desired one. There are many ways of replacing conversion by ontology mapping or bridging:

  • use bridge ontology and reasoner to generate TF ontology triples from imported ontology triples at query time
  • import a query which generates the TF ontology from the imported ontology at load time

A complete conversion procedure can be coded in the form of an ontology which imports the results of a number of queries on the source ontologies and intermediate ontologies based on the same principle. At the leaves of the imports tree are instances of the ontologies to convert. The conversion ontology represents the conversion process and its result at the same time. The conversion gets in effect rerun every time the ontology is loaded. Since the process can take a while, TF caching is useful to avoid actually doing the conversion at every load. But by proper version control on the cache, the conversion can be set up to run automatically every time the input ontologies change. In this approach, conversion ceases to be a separate offline affair, and becomes an active part of the TF distributed repository access system. The conversion is not just a once-off process but a type of ontology, a virtual TF ontology.

An even looser coupling to TF can be maintained with the help of bridge ontologies. In this case, the third party ontology is not converted at all. Instead, TF queries are carried out against it by bridging its concepts to TF with a separate bridge ontology. An example is the bridging of the BioCaster epidemic ontology to TF with the bridge ontology biobridge.ttl .

The BioCaster ontology is an example of a Legacy profile term ontology. It distinguishes concepts and terms, but not expressions. Here is an example of how terms look in it:

@prefix : <http://biocaster.nii.ac.jp/biocaster#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix p1: <http://biocaster.nii.ac.jp/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . :Country_235 rdf:type :Country ; :ISOCode "VN"^^xsd:string ; :hasLink :Wikipedia_04977 , :Wikipedia_04976 ; :label "Vietnam"^^xsd:string ; :synonymTerm :vietnameseTerm_1188 . :vietnameseTerm_1188 rdf:type :vietnameseTerm ; :hasRootTerm :Country_235 ; :label "Việt nam"^^xsd:string .

The English term for a country is of type Country and has an English language label. It is linked to its synonyms in other languages which are linked back to it with the hasRootTerm property. Concepts, terms, and expressions are not separated as distinct entities. Biocaster can be construed as a special case of a TF Sign ontology where terms are self-referential and self-designative. These identifications can be spelled out as the following bridge ontology:

term:selfReferent rdfs:subPropertyOf term:hasReferent ; rdf:type owl:ReflexiveProperty . :hasRootTerm rdfs:subPropertyOf term:selfReferent . :Country rdfs:subClassOf :englishTerm . :englishTerm rdf:type term:Term ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:hasValue "en"^^xsd:string ; owl:onProperty exp:langCode ] . term:hasDesignation rdf:type owl:ReflexiveProperty . :label rdfs:subPropertyOf exp:baseForm .

The first two axioms say that root terms are self referential. The next pair express that country names are in English. The last two axioms say terms are designations. Using this bridge ontology, a reasoner is able to parse BioCaster on the fly as a TF Sign ontology.

One possibility would be to link ISOcat to TF through its web service interface, as a collection of virtual TF ontologies. This could be tried with the ONKI ontology library too. However, as of fall 2011, ONKI is not planning to provide a SPARQL entry point (Osma Suominen, p.c.)

Third party terminologies are imported to TF in different ways depending on format. If the new terminology is in RDF, it may be enough to build a bridge ontology. A bridge ontology imports the third party RDF content, the TF schema and possibly other TF ontologies, and contains statements that connect the third party ontology to TF resources.

>For example, in the 4M ontology, a terminology of ship diesel engines is related to TF via a bridge ontology that imports a generic engine ontology, the TF schema, and contains bridge statements that connect its resources to the engine ontology and/or relate its classes and properties to the TF schema. Another example is the BioCaster epidemic ontology, which is an OWL ontology that describes diseases and their names in many languages. Though its syntax is OWL, the class and property structure differs from TF. In the bridge ontology, the BioCaster property biot:englishTerm , which assigns to a concept an English term in BioCaster, becomes a subproperty of TF property term:hasTerm .

An advantage of bridging is that no syntactic conversion is needed, as the work relating the ontologies is done by the reasoner. The third party ontology keeps its vocabulary, can evolve separately and stays available for TF query at any time. On the minus side, the third party ontology cannot be queried TF without a reasoner that applies the bridge. This difficulty can be solved by running the reasoner offline to produce the materialization of the third party ontology in TF.

If the third party terminology is not an ontology, some syntactic format conversion is needed. A variety of ontologies and third party terminology collections in legacy formats have been imported or converted to TF form to develop and test different conversion methods. The 4M project ontology, the historical starting point of TF, and the mobilite space ontology were early examples of importing third party OWL ontologies. A fragment of the Finnish YSO library ontology (in OWL Full) was extracted using Sesame RDF tools as an example of a subject field thesaurus ontology. A paper industry vocabulary was converted from MultiTerm format. A third party glossary of building management terms was chosen as an example of importing from a legacy format first to TBX and from TBX to TF. An example of ad hoc conversion a multilingual glossary of chemistry terms converted directly from excel sheets to TF OWL/XML using perl.

XSLT conversion scripts from MultiTerm XML format to TBX and back are in the plans.

WordNet

WordNet

WordNet 2.0 was converted for the w3c consortium by Mark van Assem and later updated to WordNet 3.0 .

All of WordNet 3.0 has been translated into Finnish in the FIN-CLARIN project. For TF, the Finnish WordNet translations have been converted from xml to owl. The current TF WordNet conversion uses van Assem's w3 converter and format.

Show/hide Wordnet example

The advantages claimed for the w3c conversion over other versions are that it is complete, uses slash URIs , provides OWL semantics while still being interpretable by RDFS infrastructure, provides a Basic and Full version, and provides URIs for words. (The Basic version provides a table of full synonyms (orthographic variants, abbreviations and such.) A previous conversion was done from xml using xsl scripts fiwn-sumo.xsl and fiwn-vunl.xsl. The translations are in fiwn-all-sumo.owl and fiwn-all-vunl.owl, respectively . The TF WordNet schema contains a bridge between the two conversions.

Another conversion of WordNet 3.0 to OWL is available from SUMO site ontologyportal.org . This version has been linked to SUMO.

The data models of both WordNet to RDF/OWL conversions match the TF sign structure. Word, WordSense, and Synset are distinguished as subclasses of TF form, sign, and meaning. Both conversions have been bridged to TF. More precisely, wn/TFwn.owl bridges vunl WordNet to TFTop.owl, and TFwn.owl imports wn-schema-align-sumo-vunl.ttl which aligns vu.nl WordNet with SUMO WordNet.

The two conversions use different resource URIs for WordNet resources. van Assem's conversion constructs uses descriptive synset URIs constructed from representative word senses. The SUMO conversion constructs synset URIs from WordNet synset IDs. Alignment files wn-schema-align.ttl, synset-align.ttl, and sense-align.ttl are provided between the sumo and vu.nl conversions. The schema alignment is provisional, since at the time of writing, the new vu.nl 3.0 schema files are unfinished and do not match the data files provided. The provisional schema alignment prefers the 2.0 schema vocabulary used in the datafiles over namespaces used in the new schemas to minimise need of factoring the data. The instance alignment files define owl:sameAs mappings between the URIs of the two conversions.

There are numerous further differences between the conversions. Some relations that are sense relations in vu.nl are meaning relations in SUMO (e.g. antonyms, derivatives; here the vu.nl representation is more precise). The WordNet specific word class of derived ("satellite") adjectives is merged with other adjectives in SUMO (but reflected in the wn:pertainym relation). SUMO WordNet contains irregular word forms. The SUMO WordNet.owl.rdf had some schema and character errors that have been fixed in wordnet-sumo.owl.

Ad hoc conversion to TF

As an example of conversion from a typical tabular file format (entry per row, concept and language sets per column) through TBX to TF, we converted a multilingual welfare vocabulary via TBX Basic to TF. The table-to-TBX conversion was made with a simple Perl script txt2tbx.perl .

A conversion from TF back to puls ontologies is in the plans. It will allow PULS to use ontologies developed in TF without changing the internals of the PULS system.

Other top ontologies

This section relates the TF schema TFS.owl to available large general purpose ontologies. Especially with large ongoing ontology efforts, it does not make sense to convert data to TF. That would undermine the whole idea of the Semantic Web. Rather than take a huge snapshot doomed to obsolescence, the more sensible solution is to use third party stores and original URIs as is and bridge them to TF,

Top ontologies
WordNet

WordNet is a large open source lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. WordNet has been converted by others to RDF more than once. We bridge WordNet to TF rather than do yet another version. See section on WordNet conversion .

Wordnet 3.0 OWL has almost half a million URIs. FinWordnet adds another couple of hundred thousand. This is more than twice the number of word senses in WordNet because synsets and senses have their own URIs in OWL. (WordNet has between one and two senses per synset in the average). We want to use TermFactory as a platform in a crowdsourcing effort to check the Finnish translation of WordNet.

WordNet consists of about 100K synsets. For editing the English-Finnish WordNet, TF splits the synsets into entry size files for English terms, Finnish terms, and (English) relationships. An entry is collected together from these pieces on demand with a php entry generator script, shown below. A url of form wn30entry?e=synset-entity-noun-1 generates the wordnet entry for synset-entity-noun-1 .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix meta: <http://tfs.cc/meta/> . @prefix Wn30: <http://tfs.cc/wn30/> . @prefix Wn30en: <http://tfs.cc/wn30/en/> . @prefix Wn30enh: <http://tfs.cc/wn30/en/h/> . @prefix Wn30fi: <http://tfs.cc/wn30/fi/> . [] rdf:type meta:Entry ; meta:active Wn30fi:fi-<?=$_GET['e']?>.ttl ; meta:schema <http://tfs.cc/owl/wn/TFwn.owl> ; meta:template <home:/etc/templates/sem.ttl> . Wn30:<?=$_GET['e']?>.ttl rdf:type owl:Ontology ; owl:imports Wn30en:en-<?=$_GET['e']?>.ttl ; owl:imports Wn30enh:enh-<?=$_GET['e']?>.ttl ; owl:imports Wn30fi:fi-<?=$_GET['e']?>.ttl ; owl:versionInfo "TF WordNet version 0.2 21.01.2012" .

The current TF WordNet conversion uses the w3c namespace URIs.

SUMO

The OWL version of the Suggested Upper Model Ontology SUMO (SUMO.owl) from Teknowledge (version 1.5.?) contains around 10K classes at the top. The full ontology with instances contains about 250K URIs, including names of airports, languages, cities, etc.

SUMO is an encyclopaedic ontology, while TF is geared toward terminology and natural language semantics. That is to say, TF semantic classes (like sem:Causer, sem:Doer ) name natural language semantic roles. They are evaluated by their ability to predict word choices. The TF schema is thinner and coarser than SUMO. Top classes in ontologies are particularly hard to match because more populous classes allow more splits. Often the best bet is to leave the uppermost topology alone and bridge lower down. SUMO has been tentatively bridged with TF in the bridge ontology SUMO2TFS.owl .

FinnOnto

The Finnish national FinnOnto project has built a sizable thesaurus ontology collection available at ONKI . Parts of it have been translated into English and Swedish. TF bridges to ONKI in the TF domain classification ontology TFSField.owl .

Repositories

Conceptual taxonomies and import relationships create some sort of a hierarchy in ontology content. TF sites that manage the content need not be hierarchically arranged. Sites may reciprocate in borrowing contents, and all TF repositories can communicate directly through the web. Non-TF sites and repositories in the Linked Data cloud are connected to TF over SPARQL endpoints and bridge ontologies. See here for an early blueprint with a more straightlaced view on things.

A TermFactory site is a web site identified by a site URL, owned by some organization, that maintains one or more TF back end repositories, one or more TF front end platforms/tools, and has some maintenance staff. A TF repository consists typically of one or more web servers with file space, one or more persistent-ontology databases, and TermFactory web services. TF repositories can communicate with one another through web services with and without the mediation of TF front ends. There are more ways to distribute the pieces, given that each piece is a standalone entity, but this is one likely scenario.

A system of TermFactory sites consists of RDF or OWL databases and documents and TF services connected to one another through web protocols. In the following figure, the cloud represents a system of TF sites and other RDF or RDF convertible data sources for TF. Yellow blobs are lexical resources and blue ones substance ontologies, the green overlap represents terms. The cloud is pyramidal to suggest inheritance relationships between the terminology resources that each repository "owns", the more generic resources nearer the top.

Repositories usable with TermFactory come in various sorts and sizes.

  • Files
  • Web documents
  • Read/write RDF databases
  • queryable SPARQL endpoints
Show/hide TF repository network TF repository network

Each repository can store several term collections as OWL ontologies. A site's own terminology collection is named by a common URI (universal resource identifier), for instance, URI http://tfs.cc goes with the TermFactory ontology schema and top ontologies. Each site stores those ontologies which it owns and manages, plus it can cache or mirror ontologies which are owned by some other site.

(version 0.2) The current TermFactory ontology resides in http://grapson.com/TF/owl/TermFactory.owl

The TermFactory schema TFS.owl is maintained at http://tfs.cc/owl/TFS.owl

Converters

A TermFactory converter is a built in or user defined Java class that converts documents from a third party format to RDF on download. A converter implements the interface com.grapson.tf.util.Converter with method public abstract String convert (String str) throws Exception;.

A sample built in converter is com.grapson.tf.util.XMLConverter that can be used to convert HTML pages with XSLT to RDF. This converter is applied when the --conv argument is an xsl stylesheet (file extension .xsl). The actual location of the stylesheet is resolved with TF location mapping.

HTML web pages are in general not well formed xml and cause XMLConverter to fail on them. To tidy up HTML for XSLT, the input html can be preprocessed with a tidying script in some scripting language that implements the Java Scripting API. Java comes with Javascript built in. Another script language available in TF is the Perl style language sleep. The name of the scripting language to use is set in the TF option TF_TIDY. The script has the same local name as the XSL script and has a file extension matching the scripting language (.js for javascript, .sl for sleep, for instance). The actual location of the script is resolved by TF location mappings. If a tidying script is not found, the preprocessing step is skipped.

TF provides sample stylesheets for Wiktionary (wikt.xsl) and European termbank IATE (iate.xsl) and the associated sleep scripts (wikt.sl, iate.sl) in the default TF_SCRIPTS directory. A query for equivalents in these sources for a search string 'book' is abbeviated by the default location mappings to wikt=book and iate=book, respectively:

tfget --conv=wikt.xsl "http://en.wiktionary.org/w/api.php?action=parse&page=book&format=xml"

abbreviated to

tfget wikt=book

Another built in converter is com.grapson.tf.util.JSConverter. This converter is applied when the --conv argument is a javascript (file extension .js). The actual location of the stylesheet is resolved with TF location mapping. The input to the script is taken from javascript global variable str and the conversion result is read off the same variable.

Persistent repositories

A persistent repository is one that stays around after closing a session or connection, in contrast to one that is constructed on the fly in runtime memory and destroyed after use. It can be a sparql protocol endpoint with its own query service. It can be a triple store or a relational database which stores OWL data (as ontology triples or some more storage-efficient form). It can be an editable file repository like WebDAV. Or it can be just a static read-only web document URL.

Assemblies

TF provides a Jena assembler interface to connect and edit persistent RDF repositories. The TermFactory Jena assemblies and location mappings together constitute a uniform customisable way to access a variety of different persistent ontology repositories.

The Jena assembler library allows constructing RDF models and datasets according to a recipe also stated in RDF. The Assembler RDF vocabulary is given in the Assembler schema, whose conventional prefix is ja . The schema vocabulary is detailed in the Jena Assembler howto . TF extensions to the vocabulary are in namespace http://tfs.cc/etc/assemblies/ abbreviated by prefix ja4tf .

The TF directory /etc/assemblies contains the following sample assemblies:

tdb.ttl assembler for Jena TDB triple datastore
sdb.ttl assembler for Jena SDB relational database
owlim.ttl assembler for OWLIM-SE RDF database

New datastores can be instituted by copying mutatis mutandis one or the other of these samples, or by adding a new type of assembler.

A query specifying multiple datasets or endpoints as repositories is served by looping the query engine over the list of datasets specified as query respositories. Note that since queries to different datasets are handled separately, a multi-dataset query will not notice relationships that go across datasets.

The result of a query is returned as a dataset when the --quads (-4) flag is set. Provenance information can be coded in a query result graph as contexted triples, which TF knows to insert into the result dataset. Here is an example dataset CONSTRUCT query which returns a dataset back as it was. (It would surely look better if quad syntax was allowed in the CONSTRUCT part too, but that is not in the SPARQL standard yet.)

CONSTRUCT { ?s ?p [ rdf:value ?o ; meta:context ?g ] } WHERE { GRAPH ?g { ?s ?p ?o } }

The TF sparql query parser resolves the argument of a FROM clause against TF_BASE before it gets to TF location mappings. This prevents using plain TF aliases in FROM.

The TF default location mapper in /etc/location-mapping.n3 provides sample mappings for the prefixes dav tdb sdb owlim to the corresponding assembler documents. Some RDF repositories (including tdb and owlim) require modelIDs to be URIs. This requirement is minimally satisfied by having a colon in the modelID.

TF only handles one datastore or endpoint per one SPARQL query. A query specifying multiple datasets or endpoints as repositories causes an exception. What one can do is write subordinate TF query urls for each dataset or endpoint and supply them as repositories. This can be automated further by writing boilerplate that runs same query against different datasets and accumulates the results using the add (-p) option. The query alias list2 is an example of this.

Show/hide iterated repository listing query
multiple repository listing query
Input fish Query_Results ( 1 answer/s limit 1000 ) -------------------------- | g | ========================== | <home:/io/fullfish.ttl> | -------------------------- http://localhost/TermFactory/query?q=%23+select-named-graphs-by-graph-iri-i.sparql%0A%23en+table+of+named+graphs+in+a+dataset+by+graph+name+%28iri-i%29%0A%23fi+luettelo+verkkojen+nimist%C3%A4+siilosssa+verkon+nimell%C3%A4+%28iri-i%29%0ASELECT+DISTINCT+%3Fg+WHERE+%7B+GRAPH+%3Fg+%7B+%3Fs+%3Fp+%3Fo+%7D+.+FILTER%28REGEX%28STR%28%3Fg%29%2C%22%28INPUT1%29%3F%22%2C%22i%22%29%29+%7D+ORDER+BY+%3Fg%0A%0A%0A%0A&i=fish&r=owlim%2B_2013-05-02T16:27:09.136Z Input fish Query_Results ( 5 answer/s limit 1000 ) ------------------------------ | g | ============================== | <Category:Edible_fish> | | <Category:Edible_fish.ttl> | | <Category:Edible_fish2> | | <Category:Edible_fish3> | | <Category:Edible_fish4> | ------------------------------ http://localhost/TermFactory/query?q=%23+select-named-graphs-by-graph-iri-i.sparql%0A%23en+table+of+named+graphs+in+a+dataset+by+graph+name+%28iri-i%29%0A%23fi+luettelo+verkkojen+nimist%C3%A4+siilosssa+verkon+nimell%C3%A4+%28iri-i%29%0ASELECT+DISTINCT+%3Fg+WHERE+%7B+GRAPH+%3Fg+%7B+%3Fs+%3Fp+%3Fo+%7D+.+FILTER%28REGEX%28STR%28%3Fg%29%2C%22%28INPUT1%29%3F%22%2C%22i%22%29%29+%7D+ORDER+BY+%3Fg%0A%0A%0A%0A&i=fish&r=gflex%2B_2013-05-02T16:27:09.900Z

WebDAV

The TF WebDAV directory provides TF users with a TF document repository space "in the cloud". DAV directories can be read in browsers. They can be written using WebDAV clients like cadaver, or mounted as web directories to local filesystems. Document collections stored in DAV can also be assembled to datasets.

Web Distributed Authoring and Versioning (WebDAV, aka DAV) is an extension of the Hypertext Transfer Protocol (HTTP). Defined in RFC 4918, WebDAV provides a framework for users to create, change and move documents on a server; typically a web server or web share. The most important features of the WebDAV protocol include the maintenance of properties about an author or modification date, namespace management, collections, and overwrite protection. Maintenance of properties includes such things as the creation, removal, and querying of file information. Namespace management deals with the ability to copy and move web pages within a server’s namespace. Collections deals with the creation, removal, and listing of various resources. Lastly, overwrite protection handles aspects related to locking of files.

The TF copy service stores documents in a DAV directory by the path of the document URL. This allows placing term entries and ontology documents so that related entries/documents are clustered together in the DAV collection. (Seediscussion.)

A WebDAV document can be accessed as any web document using tfget:

tfget http://localhost/dav/home/demo/foo/bar/baz --user=name --pass=word

DAV dataset assembler

TF is also able to assemble datasets out of WebDAV collections through a custom made Jena assembler. When a DAV collection is accessed through a dav assembler, all of its documents are read as RDF. Those documents that contain valid RDF are put in a RDF dataset. A DAV dataset opens and reads WebDAV documents in memory and writes them back through WebDAV when closed. For SPARQL queries, graph names must be absolute IRIs (contain a valid scheme prefix like iri:foo. IRIs are URL (percent) decoded when loaded and URL encoded when written back to conform to filesystem name conventions.

In the ideal world of clouds, all IRIs are absolute, In working practice, it is convenient to use temporary names that only make sense locally. The dummy scheme prefix iri: is provided in TF as a convenience for this purpose. The TF dav assembler adds the dummy prefix when loading a relative filename to the assembler, and removes it when writing the model back to file.

tfcopy "--asm=dav --path=foo/bar/ --name=baz" --user=name --pass=word

The following default command aliases exemplify the use of dav assembler datasets. Command aliases can be defined by users.

tfget --user=name --pass=word dav:/doc/ get doc from user name's home in dav
tfget --user=name --pass=word list dav list user name's home in dav
tfget --user=name --pass=word "get dav foo" get foo from user name's home in dav
tfget --user=name --pass=word "drop dav foo" delete foo from user name's home in dav
tfget --user=name --pass=word "load dav foo doc" load doc to graph foo in user name's home in dav
tfget --user=name --pass=word "replace dav foo doc" replace contents of foo with doc in user name's home in dav

When user is given, a relative path starts from the user's home directory in dav.

More details in the section on the DAV directory configuration .

TDB

TDB is a component of Jena for RDF storage and query. TDB can be used as a high performance RDF store on a single machine. TF supports a Jena TDB native triple store database that maintains RDF graphs in indexed data files on the file system. It can be used by the TermFactory utilities and services through the assembler interface.

The Jena TDB package may be installed in or linked to $TF_HOME/tdb. The default location of the native tdb database files is $TF_HOME/tdb/DB. The location is specified in the assembler file /etc/assemblies/tdb.ttl.

The TDB package comes with a set of commandline utilities documented in the Apache tdb wiki:

tdbloader Bulk loader and index builder. Performans bulk load operations more efficiently than simply reading RDF into a TDB-back model.
tdbloader2 Bulk loader and index builder. Faster than tdbloader but only works on Linux and Mac OS/X since it relies on some Unix system utilities.
tdbloader3 Bulk loader and index builder. Faster than tdbloader3 but not as fast as tdbloader2, however it only requires Java so will work anywhere that Jena can be used.
tdbquery Invoke a SPARQL query on a store. Use --time for timing information. The store is attached on each run of this command so timing includes some overhead not present in a running system.
tdbdump Dump the store in N-Quads format.
tdbstats Produce a statistics for the dataset. See the TDB Optimizer description.

Some tdb commandline examples follow. The tdb directory is $TF_HOME/tdb and that the TDB database is under it in DB. tdbquery resolves relative file urls to current working directory. TermFactory resolves them to TF_HOME wen --file option is false.

tdbclean cleans the database by removing data files from tdb storage folder: rm -f "$DIR"/*.idn rm -f "$DIR"/*.idx rm -f "$DIR"/*.dat tdb_cmd tdbconfig tdbdump tdbdump --loc=DB tdb_init tdbloader tdbloader2 tdbloader2 --loc=DB ../owl/fi-TFS.owl 17:46:23 -- TDB Bulk Loader Start 17:46:23 Data phase 17:46:25 INFO loader :: Load: ../owl/fi-TFS.owl -- 2013/01/25 17:46:25 EET 17:46:25 INFO loader :: Total: 929 tuples : 0,49 seconds : 1 880,57 tuples/sec [2013/01/25 17:46:25 EET] 17:46:25 Index phase 17:46:25 Index SPO 17:46:25 Build SPO 17:46:26 Index POS 17:46:26 Build POS 17:46:27 Index OSP 17:46:27 Build OSP 17:46:28 Index phase end 17:46:28 -- TDB Bulk Loader Finish 17:46:28 -- 5 seconds tdbloader2 tdb tdbnode tdb_path tdbquery tdbquery --loc=$TF_HOME/tdb/DB "select distinct ?g where {graph ?g {}}" -------------- | g | ============== | <wide.ttl> | -------------- tdbquery --loc=$TF_HOME/tdb/DB "select * from <wide.ttl> where { ?s ?p ?o }" ------------------------------------------------------------- | s | p | o | ============================================================= | <http://koe/this> | <http://koe/is> | _:b0 | | _:b0 | <http://koe/a> | _:b1 | | _:b1 | <http://koe/long> | <http://koe/path> | ------------------------------------------------------------- tdbstats tdbtest tdbupdate tdbverify tdbclean $TF_HOME/tdb/DB

Common dataset queries and updates can be run as TF queries from command line.

command legend
tfquery -q=list --asm=tdb table of names of named graphs in tdb
tfquery -q=lint --asm=tdb table of names of nonempty named graphs in tdb
tfquery -q=table3 --asm=tdb table of triples in default graph of tdb
tfquery -q=graph3 --asm=tdb graph of triples in default graph of tdb
tfquery -q=count3 --asm=tdb count triples in default graph of tdb
tfquery -q=table4 --asm=tdb table of quads in any named graph of tdb
tfquery -q=graph4 --adm=tdb graph of triples in any named graph of tdb
tfquery -q=count4 --asm=tdb" count of quads in any named graph of tdb
tfquery -q=table4 "--asm=tdb --name=foo" table of quads in named graph foo in tdb
tfquery -q=graph4 "--asm=tdb --name=foo" graph of triples in named graph foo in tdb
tfquery -q=count4 "--asm=tdb --name=foo" count of triples in named graph foo in tdb
tfquery -q=graph4 -i=urn:x-arq:DefaultGraph --asm=tdb graph of triples in default graph of tdb
tfquery -q=graph4 -i=. --asm=tdb ditto
tfquery -q=graph4 -i=urn:x-arq:UnionGraph --asm=tdb graph of triples in any named graph of tdb
tfquery -q=graph4 -i=.* --asm=tdb ditto
tfquery -q=list -i=foo --asm=tdb table of names of named graphs in tdb matching foo
tfquery -q=load3 -i=foo --asm=tdb load triples from foo to default graph of tdb
tfquery -q=load4 -i="tdb foo" -F bar load triples from bar to named graph foo of tdb
tfquery -q=drop -i=foo --asm=tdb drop named graph foo from tdb
tfquery -q=replace -i="tdb foo" -F bar replace named graph foo with bar in tdb

Queries list and lint differ about whether the graph is required to be nonempty. Listing TDB stores only works with list. Query list lists all DAV documents, lint only RDF documents.

A TDB datastore graph name can be any string, but a SPARQL update query (ADD, DROP, ...) argument must be an absolute IRI (have a valid scheme prefix), for example DROP <iri:foo>. The TF facilites automatically add the dummy prefix iri: in front of names that have not got a prefix.

Triples are saved as a dav document in the following example query:

tfquery -Q='INSERT DATA { <http://example/book1> <http://purl.org/dc/elements/1.1/dc:title> "A new book" ; <http://purl.org/dc/elements/1.1/dc:creator> "A.N.Other" . }' -c dav:/books

Dataset queries and updates can also be carried out via TermFactory webapp and saved as TF aliases. The following commands defined in default location-mapping.n3 rephrase the above queries.

tfget "list tdb" tfget "get tdb" tfget "list tdb foo" tfget "get tdb foo" tfget "load tdb foo bar" tfget "replace tdb foo bar" tfget "drop tdb foo" tfget "get tdb urn:x-arq:DefaultGraph" tfget "get tdb ." tfget "get tdb urn:x-arq:UnionGraph" tfget "get tdb .*"

The default assembly file for a TDB dataset is shown below. Comparing to Jena from the box, there is a new vocabulary namespace ja4tf. In the TF assembler, property ja4tf:root true is used xto locate the assembly root.

The Jena TDB unionDefaultGraph flag controls what counts as the default graph of the dataset. When the flag is on, the default graph is the union of the named graphs. Only named graph triples (quads) are included in query results, the default graph (if any) is ignored. When it is off, plain triples go into a separate default graph, and plain queries answer just those triples, ignoring triples in the named graphs.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix ja4tf: <http://tfs.cc/etc/asm/> . ## Example of a TDB dataset ## Initialize TDB [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" . tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . # A TDB dataset used for RDF storage ## TF dataset root is of type ja4tf:Root <#dataset> rdf:type tdb:DatasetTDB , ja4tf:Root ; tdb:location "./tdb/DB" ; tdb:unionDefaultGraph true ; # Optional .

TDB with free text retrieval

Jena ARQ provides a facility for indexing text literals in a RDF dataset with the free text query engine Lucene. Text retrieval is a much faster way of finding occurrences of individual words in the data than RDF query with regular expression filtering. Regular expression search in a given string is fast as such, but regular expression search in RDF data involves going through all literal triples one by one just in case they might contain a given string, and that is slow. Text indexing does that work ahead of time for words (tokens) found in the data, making lookup fast for the indexed tokens. (There is no fast way to search arbitrary substrings in RDF data yet.) The free text retrieval facility is also available in TermFactory as explained below.

Defining a datastore with free text query

etc/assemblies/text-tdb.ttl is a sample assembly description for a text-indexed TDB datastore.

@prefix : <http://localhost/jena_example/#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text: <http://jena.apache.org/text#> . ## Example of a TDB dataset and text index ## Initialize TDB store [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" . tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . ## Initialize text query index [] ja:loadClass "org.apache.jena.query.text.TextQuery" . # A TextDataset is a regular dataset with a text index. text:TextDataset rdfs:subClassOf ja:RDFDataset . # Lucene index text:TextIndexLucene rdfs:subClassOf text:TextIndex . # Solr index text:TextIndexSolrne rdfs:subClassOf text:TextIndex . ## --------------------------------------------------------------- :text_dataset rdf:type text:TextDataset ; text:dataset <#dataset> ; text:index <#indexLucene> ; . # A TDB datset used for RDF storage <#dataset> rdf:type tdb:DatasetTDB ; tdb:location "./tdb/DB" ; tdb:unionDefaultGraph true ; # Optional . # Text index description (location of Lucene index given as an absolute URL) <#indexLucene> a text:TextIndexLucene ; text:directory <file:///home/lcarlson/termfactory/CF/TF/Lucene> ; ##text:directory "mem" ; text:entityMap <#entMap> ; . # Mapping in the index (defines which fields are indexed for text query) # URI stored in field "uri" # rdfs:label is mapped to field "text" <#entMap> a text:EntityMap ; text:entityField "uri" ; text:defaultField "text" ; text:map ( [ text:field "text" ; text:predicate rdfs:label ] ) .

The moving parts of the assembly are the location of the directory for Lucene index files and the entity map that lists properties having literal text values that should be indexed for text retrieval. The default is to search in values of rdfs:label properties. Alternative or additional text valued properties can be added to the entity map for TF specific fields. Alternatively, a resoner can be applied to map TF specific fields to rdfs:label. (exp:text is a subproperty of rdfs:label by the TF schema.)

Creating an index

What happens in free text indexing is that an inverted index is built offline of the RDF store. The indexer creates a small "document" for each RDF triple matching a description in the entity map. The index associates tokens (parsed substrings) in the indexed literals to the "documents" in which they occur, making it fast to retrieve the triples in which the tokens occur. What items can be queried depends on how the text is parsed. The default is to parse the text for more or less English like word tokens.

textindexdump.sh --desc=$TF_HOME/etc/assemblies/text-tdb.ttl jena.textindexdump --desc=/home/lcarlson/termfactory/CF/TF/etc/assemblies/text-tdb.ttl Doc: 0 uri = http://dbpedia.org/resource/Pool_barb text = Pool barb Doc: 1 uri = http://dbpedia.org/resource/Pool_barb text = Puntius sophore Doc: 2 uri = http://dbpedia.org/resource/Pool_barb text = Puntius sophore Doc: 3 uri = http://dbpedia.org/resource/Pool_barb text = Puntius sophore

A lucene index can be created offline for a given text database with the commandline script textindexer.sh. If the index is created at the root of TF_HOME, that is where the script must be run as well. (The location of the index is specified in the assembly, as shown above.) The resulting index can be viewed with the command line textindexdump.sh --desc=$TF_HOME/etc/assemblies/text-tdb.ttl.

$TF_HOME> textindexer.sh --desc=$TF_HOME/etc/assemblies/text-tdb.ttl
Querying text

Below is an example of a full-text query. The query lists names of subject resources whose text label contains the token (separate word) 'pool' (case insensitive). The text query is given in the form of a triple. The namespace is local name of the property is specified in the entity map as text:field. the default is given in text:defaultField. The value of the triple is a search condition (Lucene query). It can be just a token in quotes, or an RDF list of a property, a search string, and optional limit (integer) or threshold (fraction). (See documentation). Further RDF triples can be added to the query to narrow down the query and fill out the query results.

# text-query.sparql #en retrieve resources whose label has a full text match #fi tekstihaku varantojen nimikkeistä PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX text: <http://jena.apache.org/text#> SELECT ?s # ?label { ?s text:query 'pool' ; # ?s text:query (rdfs:label 'Pool' 10) ; # rdfs:label ?label . }

Here is a sample query and its result:

tfquery -q=home:/io/scripts/text-query.sparql --asm=text-tdb 2014-03-28 15:08:30,374 [main] IO INFO - map home:/io/scripts/text-query.sparql ==> file:///home/lcarlson/termfactory/CF/TF/io/scripts/text-query.sparql notry false hops 0 2014-03-28 15:08:30,414 [main] IO INFO - query keeping repo because keep is true 2014-03-28 15:08:30,416 [main] IO INFO - map text-tdb ==> file:///home/lcarlson/termfactory/CF/TF/etc/assemblies/text-tdb.ttl notry false hops 1 2014-03-28 15:08:30,432 [main] IO INFO - file:///home/lcarlson/termfactory/CF/TF/etc/assemblies/text-tdb.ttl read imports ignored model size 24 2014-03-28 15:08:31,149 [main] IO INFO - Query_Results ( 1 answer/s limit 1000 ) Query_Results ( 1 answer/s limit 1000 ) ------------------------------------------- | s | =========================================== | <http://dbpedia.org/resource/Pool_barb> | ------------------------------------------- http://localhost/TermFactory/query?&q=home%3A%2Fio%2Fscripts%2Ftext-query.sparql&r=--asm%3Dtext-tdb&Z=2014-03-28T13:08:30.362Z Query readAll_false Query_Results ( 1 answer/s limit 1000 )

SDB

Here are some examples of the use of the SDB commandline utilities.

sdbquery --query $TF_HOME/etc/scripts/list.sparql ------------------------------------------------------- | g | ======================================================= | <urn:koepala> | | <home://home/user/termfactory/CF/TF/io/mine.ttl> | ------------------------------------------------------- sdbdump --out=TURTLE --graph=file:/home/user/termfactory/CF/TF/io/mine.ttl @prefix : <http://example/> . <http://koe.fi/this> <http://koe.fi/is> <http://koe.fi/mine> . <http://koe.fi/rest> <http://koe.fi/is> <http://koe.fi/ours> .

OWLIM-SE

OwLIM Standard Edition comes with OpenRDF Sesame Workbench, which is a Tomcat web service for creating, querying and updating Sesame based RDF repositories. In order to be able to use both TF and the Workbench to manage the same OWLIM repository, you may create the repository with the Workbench. Simplest is to create a db "owlim" in workbench. Example settings for the Workbench to match the sample owlim assembler etc/assemblies/owlim.ttl (replace $TF_HOME with the value of the environment variable):

Storage folder: $TF_HOME/owlim/owlim-storage License: $TF_HOME/owlim/OWLIM_SE.license

The Workbench runs as root, so an owlim repository created by the Workbench may need to have its permissions adjusted for it to be accessible to a user running TF:

sudo chgrp -R user owlim-storage sudo chmod -R g+w owlim-storage

OWLIM SE also provides a sesame console tool for similar purposes:

$TF_HOME/owlim/sesame_owlim/bin/console.sh 14:58:11.688 [main] DEBUG info.aduna.platform.PlatformFactory - os.name = linux 14:58:11.693 [main] DEBUG info.aduna.platform.PlatformFactory - Detected Gnome window manager on Posix platform Connected to default data directory Commands end with '.' at the end of a line Type 'help.' for help > help. For more information on a specific command, try 'help <command>.' List of all commands: help Displays this help message info Shows info about the console connect Connects to a (local or remote) set of repositories disconnect Disconnects from the current set of repositories create Creates a new repository drop Drops a repository open Opens a repository to work on, takes a repository ID as argument close Closes the current repository show Displays an overview of various resources load Loads a data file into a repository, takes a file path or URL as argument verify Verifies the syntax of an RDF data file, takes a file path or URL as argument clear Removes data from a repository serql Evaluates the SeRQL query, takes a query as argument sparql Evaluates the SPARQL query, takes a query as argument set Allows various console parameters to be set exit, quit Exit the console > connect /home/user/termfactory/CF/TF/owlim. Disconnecting from default data directory Connected to /home/user/termfactory/CF/TF/owlim > show r. +---------- |SYSTEM |owlim-se-test ("OWLIM-SE test repository") +---------- > create owlim-se. Please specify values for the following variables: Repository ID [owlim-se-test]: Repository title [OWLIM-SE test repository]: License file (leave blank for evaluation): /home/user/termfactory/CF/TF/owlim/OWLIM_SE.license Base URL [http://example.org/owlim#]: Default namespaces for imports(';' delimited): Entity index size [200000]: Entity ID bit-size [32]: Imported RDF files(';' delimited): Repository type [file-repository]: Rule-set [owl-horst-optimized]: empty Storage folder [storage]: owlim-storage Use context index [false]: true Total cache memory [80m]: 256m Main index memory [80m]: 128m Use predicate indices [false]: Predicate index memory [0]: Full-text search memory [0]: Full-text search indexing policy [never]: Full-text search literals only [true]: Cache literal language tags [false]: Enable literal index [true]: Index compression ratio [-1]: Check for inconsistencies [false]: Disable OWL sameAs optimisation [false]: Enable query optimisation [true]: Transaction mode [safe]: Transaction isolation [true]: Query time-out (seconds) [-1]: Throw exception on query time-out [false]: Enable shutdown hooks [true]: Read-only [false]: WARNING: you are about to overwrite the configuration of an existing repository! Proceed? (yes|no) [no]: yes Repository created > quit. Disconnecting from /home/user/termfactory//CF/TF/owlim Bye user@tf-exia:/opt/owlim/owlim-se-5.2.5563/sesame_owlim/

Remember to give storage file path as owlim-storage in sesame console, the otherwise jena assembler will create one by that name and not find the storage created by sesame console. NOTE: the repo gets created only when it is first opened.

The following shorthand scripts are provided in directory io/script .

  • show-owlim <store> lists the owlim repositories (runs sesame console command show r)
  • list-owlim <store> shows owlim repository file listing
  • unlock-owlim <store> removes the repository lockfile
  • empty-owlim <store> empties the store completely (runs sesame console command clear)
  • drop-owlim <store> removes the repository completely (runs sesame console command drop)

OWLIM script getting-started/example.sh may be used to populate an OWLIM repository from files. The following command line sample populates the OWLIM repository described by assembler $TF_HOME/etc/assemblies/wordnet.ttl is preloaded with files linked to directory $TF_HOME/owlim/preload/wordnet.

~/termfactory/CF/TF/owlim/getting-started/ ./example.sh config=$TF_HOME/etc/assemblies/wordnet.ttl preload=../preload/wordnet queryfile=none 12:23:16 Using parameters: 12:23:16 chunksize=500000 config=/home/user/termfactory/CF/TF/etc/assemblies/wordnet.ttl ... 12:28:15 ===== Shutting down ========== 12.2.2013 12:28:16 com.ontotext.trree.big.AVLRepository shutdown INFO: NumberOfStatements = 6508563 12.2.2013 12:28:16 com.ontotext.trree.big.AVLRepository shutdown INFO: NumberOfExplicitStatements = 5459234 12.2.2013 12:28:17 com.ontotext.trree.sdk.a.d b INFO: Shutting down plugins... ~/termfactory/CF/TF/owlim/getting-started/

An owlim repository can declare prefixes in its owlim.properties file, for example, $TF_HOME/owlim/repositories/owlim-se-test/owlim-storage/owlim.properties . There is a sample file $TF_HOME/etc/owlim-tfs.namespaces that declares the TF namespaces in the expected format.

Directory $TF_HOME/owlim/owlim-loader contains a bulk loader based on the owlim getting-started application. A script to run it is $TF_HOME/owlim/owlim-loader.sh. The command line looks like this.

./owlim-loader.sh config=$TF_HOME/etc/assemblies/wordnet.ttl preload=$TF_HOME/owlim/preload/wordnet directory=$TF_HOME/owlim

The command loads the rdf file contents of the preload directory in a local repository in the given directory, taking the repository configuration from the config file.

SPARQL endpoints

A SPARQL endpoint is a web address exposing a SPARQL query (and/or update) service that answers queries on ssome dataset/s accessible to the service.

More precisely, a SPARQL endpoint is a conformant SPARQL protocol service as defined in the sparql 1.1. protocol specification. A SPARQL endpoint enables users (human or other) to query a knowledge base via the SPARQL language. Results are typically returned in one or more machine-processable formats. Therefore, a SPARQL endpoint is mostly conceived as a machine-friendly interface towards a knowledge base. Both the formulation of the queries and the human-readable presentation of the results should typically be implemented by the calling software, and not be done manually by human users.

W3C has a list of SPARQL endpoints . Some endpoints provide alongside ontology information also localization labels, mostly in the form of rdfs:label properties. TermFactory should be able to harvest such multilingual information as follows.

A SPARQL endpoint can be addressed with an appropriate query to extract localization information from it. For DBPedia and FactForge, localization information is associated directly to ontology concepts as language tagged rdfs:label properties.

The TermFactory query utility and the webapp provide an option for forwarding a TermFactory query to another SPARQL endpoint.

TermFactory SPARQL endpoint

TermFactory also provides a standard SPARQL 1.1 compliant SPARQL endpoint at webapp address /TermFactory/sparql . It uses the TermFactory query servlet to serve standard SPARQL protocol queries.

Small models

Small models

OWL does not impose any built in constraints on how a larger ontology should be split into smaller parts. Good TF practice may dictate some conventions. One natural division is between language independent concept ontologies (which may come from a third party source) and language dependent term ontologies, multilingual or language specific. A systematic naming convention for naming localizations is proposed further below.

Ontology query and reasoning is sensitive to repository size. In TF, repositories can get big, but queries and imports should remain manageable size. This suggests a modular "small models" approach. But managing a swarm of small models in the cloud can become quite complex too. Ways to keep indexes and caches to help search the distributed contents are in demand. Here are some of the options.

  • Cache query results and edits in TF repository databases
  • Mirror ontology documents in local TF repository directories
  • Cache entry html pages on content management systems

Different types of cache can be pre-populated on a regular basis from the base ontologies on the basis of popular queries. One type of popular query is a terminological entry. There are several possible ways of indexing and caching ontology documents and small TF models for quicker retrieval.

The concept of factored ontologies was applied already in the predecessor of TF, the 4M project ontology, which built a hierarchy of increasingly specific domain ontologies starting from a set of common concepts through a series of sector specific ontologies (networking and diesel engines in 4M) to ontologies of company specific terms or product names (Windows XP and Wärtsilä ship diesels). Each language-independent ontology has corresponding term ontologies associated to it. The conversion of the BioCaster disease ontology to TF splits Biocaster ontology into a number of domain specific ontologies: diseases, locations, etc., plus a set of term ontologies in different languages to match the domain ontologies.

In TF, a further layer of factoring of domain independent expression ontologies suggests itself. An expression ontology might split into dialect independent and dialect specific parts, for instance, for English or Brasilian, or by expression type or part of speech.

The division of content by such natural dividing lines should allow focusing search for relevant content to those sub-ontologies that are most likely to contain it. A consistent naming discipline for the URLs of such sub-ontologies could obviate or alleviate the need of a separate index (catalog or registry) to tell where to look for resources.

Imports, datasets, listings

OWL imports provide a way to extend OWL documents with other OWL documents. Imports are coded as triples inside an ontology, so they form part of a given ontology. There are flags and settings to control the inclusion or exclusion of imports in particular queries or globally.

RDF does not have a concept of imports like OWL. For SPARQL queries, there is the notion of RDF dataset. A RDF dataset consists of one default graph and a number of named graphs. Datasets can be stored in RDF databases or collected on demand in memory from a list of document URLs. Imports make sense when the dependencies between collections are fixed. RDF datasets are simpler than ontology importing in that the collection is not a tree but a flat list (there are no inclusions). A SPARQL query from a dataset containing named graphs can refer to the graphs by name, and the answer can consist of quadruples (quads) where the fourth member is the name of the containing (aka context) graph.

A third way to pool things together is provided by TF listings. A listing is a text file with dot separated file suffix in TF_LIST_EXT (default value .tsv ), containing names or addresses of resources separated by whitespace. Compare also collections . An alias pointing to a listing can be prefixed by the same extension followed b colong (default lst: ). A query that evaluates to a listing can be marked with querystring final dummy parameter z=.tsv .

TermFactory allows listings in many places to specify more than one resource at a time.

Listings can be generated by TF query using a SELECT query and the tab separated vector (TSV) format. For instance, the query

http://localhost/TermFactory/query?q=home:/etc/scripts/select-tfs-ontologies-from-ont-policy.sparql&f2=TSV&z=.tsv

produces a listing of the TF schema ontologies registered in a site's ontology policy. The filetype parameter z=.tsv at the end of the query url serves to tell the query engine to parse the query answer as a listing. The parameter name x is just a dummy, undefined and ignored by QueryForm. In effect, the filetype parameter z=.tsv resembles the Unix command line backticks in that the results of one query are used as inputs to another.

Filetype parameters of form z=.foo are usable generally to indicate the return filetype of a query url. In general, the return type of a query url cannot be easily figured out in any other way, given nesting of queries and aliasing.

TF address aliasing

The TF location mapper helps nickname a long URL with a free form alias, known as TF address. One application of this is using a clean url in place of a TF query url. Such an alias need not occur as a resource URI in ontology triples at all. Or an alias may occur only as the object of import triples to identify an ontology. An alias relativizes Semantic Web addressing orthodoxy or Leibnizian duality of resources and ontologies to a given set of reference ontologies. An alias is a nickname to a resource that points to a subset of its properties obtained from a given set of repositories.

For instance,

http://localhost/tfs.cc/ont/ctryCode

can have an alias

http://localhost/tfs.cc/ctry/ont/ctryCode

This alias is location mapped to a URI query whose the path identifies the home ontology of the resource.

[ rdf:type tf:Alias ; tf:mapping [ tf:pattern "http://tfs.cc/ctry/(.*)" ; tf:altPattern "http://localhost:8080/TermFactory/query?uri=$1&r=http%3a%2f%2ftfs.cc%2fowl%2fctry%2fTFCtry.owl" ] ] .

We may single out aliases by tagging them with the rdfs class tf:Alias as shown. Since location mappings are RDF, they can queried with TermFactory. The following TermFactory query selects the aliases from the location mapping configuration file.

tfget http://localhost/TermFactory/query?q=home:/etc/scripts/select-query-aliases.sparql Query Results (2 answer/s limit 100) ----------------------------------------------------------------------------------------------------------------------------------------------------------- | s | p | o | =========================================================================================================================================================== | _:b0 | http://tfs.cc/alias/altPattern | "http://localhost:8080/TermFactory/query?uri=$1&r=http%3a%2f%2ftfs.cc%2fowl%2fctry%2fTFCtry.owl" | | _:b0 | http://tfs.cc/alias/pattern | "http://tfs.cc/ctry/(.*)" | ----------------------------------------------------------------------------------------------------------------------------------------------------------- message: location http://localhost/TermFactory/query?q=home:/etc/scripts/select-query-aliases.sparql

Query aliases are used in the Mediawiki TF editor interface to map Mediawiki page titles to TF web urls.

Aliasing localizations

For automating queries it is useful to have a systematic naming convention between resources/ontologies and term ontologies that localize them. One way to systematically name localizations is a TF address composed of a scheme prefix of form fi-foo for a language code prefix like fi and ontology address foo . Addresses of this form may be mapped to localization ontologies corresponding to each ontology.

Both "big" ontologies and resources (entries) can have localized projections accessible through a localization alias. A localization alias for a designation fi-exp:en-lion-N could map to a TF SELECT query that returns a listing of its translations, including exp:fi-leijona-N .

Sample term/expression/string localization queries below. The last example shows a multilingual localization alias x+ abbreviating the query above it.

pellet4tf query -F -q home:/io/sparql/select-term-equivalents.sparql -f TSV -i term:en-number-N_-_exp-number -F2 ../owl/tf-TFS.owl ?term <http://tfs.cc/term/en-number-N_-_exp-number> <http://tfs.cc/term/fi-luku-N_-_exp-number> <http://tfs.cc/term/zh-数-N_-_exp-number> pellet4tf query -F -q home:/io/sparql/select-exp-translations.sparql -f TSV -i exp1:en-number-N ../owl/tf-TFS.owl ?exp <http://tfs.cc/exp/en-number-N> <http://tfs.cc/exp/fi-luku-N> <http://tfs.cc/exp/zh-数-N> tfget x-exp1:en-number-N ?exp <http://tfs.cc/exp/en-number-N> <http://tfs.cc/exp/fi-luku-N> <http://tfs.cc/exp/zh-数-N> message: location http://localhost/TermFactory/query?a=1&q=file%3aio%2fsparql%2fselect-exp-translations.sparql&r=http%3a%2f%2ftfs.cc%2fowl%2ftf-TFS.owl&f2=TSV&i=exp1:en-number-N

Here are sample localization aliases. The first alias below is used in the last translation example above.

# localization aliases [ rdf:type tf:alias ; tf:mapping [ tf:prefix "x-exp" ; tf:altPrefix "http://localhost/TermFactory/query?a=1&q=file%3aio%2fsparql%2fselect-exp-translations.sparql&r=http%3a%2f%2ftfs.cc%2fowl%2ftf-TFS.owl&f2=TSV&i=exp" ] ] . [] tf:mapping [ tf:pattern "fi-.*" ; tf:altPattern "http://tfs.cc/owl/fi-TFS.owl" ] .

Mirroring another site's ontology documents

A TF site may want to mirror contents of another site in its own document index and use TF aliases to redirect requests for these documents to the local copy. Retrieval is faster but some mechanism is needed to keep the copies up to date (see section on version checking ).

Semantic Web addressing orthodoxy recommends that a site's (say tfs.cc) official own read only files are accessible directly at the site's document root, so that for tfs.cc for instance, the locations of its own ontology files are at

  • /owl/... for ontology files
  • /ont/... for concept entry files, etcetera

If a site mirrors another site, then read only copies of a mirrored site, say tfs.cc mirroring grapson.com, could be accessible as tfs.cc/grapson.com/...

A site may want to mirror edited but not yet official published versions of its own and other sites' documents in a web writable dav directory for middle term saving. These collections could be addressed thus:

  • tfs.cc/dav/localhost/... for own documents
  • tfs.cc/dav/grapson.com/... for other site's documents

This proposal creates an asymmetry because localhost is not on the path for a site's own official documents. We don't need or want tfs.cc/localhost/owl, because tfs.cc is the localhost. For the official names of resources, the shorter the address the better. But we prefer tfs.cc/dav/localhost/owl/... for the dav versions of a site's own files, instead of the shorter tfs:cc/dav/owl/... There are two (admittedly not completely knockout) arguments for this proposal. First, the symmetry avoids parsing errors, since the directory structure is not dependent on the naming. Second, it avoids omission errors when the local dav collection is to be mirrored elsewhere.

Indexing by aliasing

We might want an aliasing discipline for identifying resource collections. What it would accomplish is make an IRI like http://tfs.cc/ont/ctryCode resolve in the TF server to a description of just that particular resource. Typically, this URL points to the result set of a DESCRIBE query on the resource.

Such resource aliases could form virtual collections. The home collection of http://tfs.cc/ont/ctryCode is the collection at http://tfs.cc/ont/ . The URL http://tfs.cc/ont/ctryCode should fetch a DESCRIBE result (entry) for concept ctryCode from that collection. This can be accomplished in different ways. The simplest solution is to store an entry at the location. Another way is to useserver URL rewriting. A third way is to let http://tfs.cc/ont/ctryCode point to a web directory. The directory holds cached versions of the entry in different formats. The directory index (index.php) returns one of the files in the directory.

The URL http://tfs.cc/ont/SubjectField/Geography/ is an example of an ontology URL that collects entries for geographical concepts. This indexing URIs uses TF domain hierarchy to define a path for indexing entry documents. Entry http://tfs.cc/ont/Place could be found indexed in this collection as http://tfs.cc/ont/SubjectField/Geography/Place . The advantage of such indexing is that concepts can be found with minimum of machinery. A drawback is that we need machinery to make sure that changes in the indexing ontology are reflected in the indexing URIs. For automating the indexing, see CopyService .

A naming convention is then one possible indexing into the collection. It would be too narrow to fix on just one index here. We should keep in mind that a resource can have many URIs pointing to it. One approach is to develop a naming convention based on the [scheme:][//authority][path][?query][#fragment] URI structure with three zones:

  • site ID, which includes the (scheme if relevant), authority and a (possibly empty) prefix of the path
  • indexing zone, which includes a redundant (possibly empty) middle zone of the path
  • resource ID, which includes some suffix of the uri, including the resource's local name.

If slash vocabulary is used, the local name is the last path element. This is preferred in TF in the name of Semantic Web addressing orthodoxy .

The site ID and resource ID together identify a resource as its home site and name. The home site is the one that has authority to create and delete resources in that namespace. A TF resource like a concept, term, or expression, is sufficiently identified by a family name, the home site , its official given name/s. Between the family name and the given name/s, there may be an indexing zone consisting of optional path elements, analogous to nick names, as in William Frederick "Buffalo Bill" Cody . Thanks to the indexing zone, a resource can be identified by many URIs in the home namespace, meaning it can have many aliases in its home repository.

The indexing zone allows for aliases for a resource URI which are sufficient but not necessary for identification, i.e. any ID of form [repoID][indexZone][resourceID] points to the same resource independent of what the index zone contains. The index zone can then constitute alternative directory trees for the repository ontologies/subsets that allow pre-selecting parts of the ontology according to different criteria. (Different paths can also point to the same contents.) The naming convention then provides another indexing device to prefabricated results of ontology queries.

The indexing zone could also be used to associate a resource with its definining ontology using TF aliases This avoids the need to create a separate catalog for the purpose.

Directory index

A TF resource URI like http://tfs.cc/ont/ctryCode might get described by a TF DESCRIBE query http://localhost/TermFactory/query?uri=ont:ctryCode . Say we want to store a copy this description ("entry") in the grapson.com site filesystem so it can be retrieved by its URL. One convention is to save the entry as a file in one or more of the TF formats in a resource directory pointed at by http://grapson/com/tfs.cc/ont/ with the local name plus appropriate suffix, say http://grapson.com/tfs.cc/ont/ctryCode.html . The place to code this (or some alternative) addressing strategy is in location mappings .

The TF WebDAV directory manager stores ontology documents (including individual entries or collections) by URI path, so as to reflect the position of the entry or collection in some TF class taxonomy or other. Given alternative aliases for alternative taxonomies, the same item(s) can be indexed under many alternative paths.

An example of creating a directory index of precomputed entries is the following. The work is done by perl script io/bin/urlindex

pellet query -e ARQ -q classes.sparql ../owl/TFS.owl > TFS.uris
lists classes in TFS.owl
sudo -E $TF_HOME/io/bin/urlindex TFS.uris
creates and populates the index

The resulting directory tree is shown below.

Show/hide TF url index

For another example, an entry for China (the country) could be indexed by subject field under URI http://tfs.cc/ont/SubjectField/Geography/Country/China along with other entries for countries in the same folder. A (cross) classification can be multiply indexed by different sort orders and a deep one by different degrees of granularity using alternative directory paths, according to need and size of the collections. A third example is an indexing by subclass structure as in the toy example below.

  • Human
    • Boy
    • Girl
    • Man
    • Woman
    • Female
      • Girl
      • Woman
      • Adult
        • Woman
      • Young
        • Girl
    • Male
      • Man
      • Boy
      • Adult
        • Man
      • Young
        • Boy
    • Adult
      • Man
      • Woman
      • Female
        • Woman
      • Male
        • Man
    • Young
      • Boy
      • Girl
      • Female
        • Girl
    • Male
      • Boy
Building a class directory index

An example of building a directory index based on subclass hierarchy is the following. The work is done by three perl scripts in io/script :

classindex
builds a directory structure matching subclass hierarchy in /var/www under owl:Thing
classquery
runs TF DESCRIBE queries for all subclasses in the hierarchy
classlink
creates links from the directories in the index to the matching entries in owl:Thing

The steps to create the index structure are as follows.

pellet4tf classify ../owl/TFS.owl > TFS.classify
use pellet to classify the ontology
classindex TFS.classify > TFS.classindex
create a shell command file to create the directory tree
sudo sh TFS.classindex
pellet4tf query -e ARQ -q classes.sparql > TFS.classes
use pellet4tf to list the classes in the ontology
sudo -E $TF_HOME/io/bin/classquery TFS.classes
use pellet4tf to create entries for all classes at the root of the directory tree
sudo -E $TF_HOME/io/bin/classlink
link the entries at root to the directory tree

The resulting directory tree is shown below.

Show/hide TF class index

Filesystem index

Keeping an index of preprocessed and classified query results may sometimes be faster or at least more robust than than ontology queries against large repositories. Again, there are many alternatives:

  • TF database repository index
  • TF webDAV repository index
  • Separate indexes or registries
Database index

A TF database holds result sets of TF URI queries as Jena models using the URI of the query as model ID. One can populate the database with pre-fetched queries and use the database's indexing capabilities to retrieve relevant items from the db.

TF webDAV repository index

TF IndexService provides for copying remote (or local) entries or ontologies in a WebDAV web directory indexed by the URI path.

Full text index

TF repositories can be indexed with generic text indexing tools (Google, Lucene).

CMS cache

A content management system like Drupal can hold the results of common queries (for instance, the established entries) in html rendering for browsing in its own story database.

Resource catalogs

When TF data is distributed over sites that are not centrally managed, locating helpful TF resources is bound to be more like web search or peer-to-peer resource sharing than centralized database lookup. In the best possible future, the task is facilitated by general purpose web data search facilities outside TF itself, such as those developed in the Linked Data initiative. When working on a more local scale, dedicated community wide TF index repositories can be established. The organizational problem of keeping such indexes up to date is more demanding than the technical problem of setting them up.

Searching the whole TF cloud for matches may become slow. The usual solution to speed up search problems is to build catalogs or indexes on the data: sparse projections that can be searched faster. One can run periodic offline queries to generate from full term ontologies sparser resource catalogs, projections which only tell what terms/expressions are defined and where they are defined. First we need to harvest the TF web for the ontologies to index. Some way of knowing which TF sites there are to survey is needed. It may be a listing of TF sites kept on a common collaborative site. Alternatively, a separate registry like Neon Toolkit's Oyster could be harnessed for this purpose.

Assume we have a list of the relevant sites. Each TF site has an ontology policy that registers the ontologies it manages. These policies can be queried with the TF query engine to produce listings of the repositories. (There is a sample query in home:/etc/etc/scripts/select-defined-ontologies-from-ont-policy.sparql .)

Instead of loading and querying the repositories all at once, it makes more sense to stagger the search by providing local projections of the repositories to support given types of search. Indexes of defined terms and designations in given TF ontologies can be generated by query. Here are sample such queries:

pellet4tf query --ignore-imports -F -q sparql/construct-defined-terms.sparql --name=http://tfs.cc/owl/fi-TFS.owl > fi-TFS-defined-terms.ttl pellet4tf query --ignore-imports -F -q sparql/construct-defined-exps.sparql --name=http://tfs.cc/owl/fi-TFS.owl > fi-TFS-defined-exps.ttl

The ontologies that result from the queries consist of rdfs:isDefinedBy triples like this:

@prefix term: <http://tfs.cc/term/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . term:fi-käsitepiirre-N_-_meta-conceptDataProperty a term:Term ; rdfs:definedBy <http://tfs.cc/owl/fi-TFS.owl> .

Analogously, resource catalogs can be produced with queries like

pellet4tf query --ignore-imports -F -q scripts/construct-isdefinedby-index-for-terms.sparql --named=http://tfs.cc/owl/fi-TFS.owl > fi-TFS-terms.ttl pellet4tf query --ignore-imports -F -q scripts/construct-isdefinedby-index-for-designations.sparql --named=http://tfs.cc/owl/fi-TFS.owl > fi-TFS-designations.ttl

The ontologies that result from the queries consist of rdfs:seeAlso triples like this:

@prefix term: <http://tfs.cc/term/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . term:fi-käsitepiirre-N_-_meta-conceptDataProperty a term:Term ; rdfs:seeAlso <http://tfs.cc/owl/fi-TFS.owl> .

The following query creates a term index out of all of the ontologies registered in a site's ont-policy.

pellet4tf query -F -q scripts/construct-isdefinedby-index-for-terms.sparql -F2 --named=http://localhost/TermFactory/query?q=home:/etc/scripts/select-tfs-ontologies-from-ont-policy.sparql&f2=TSV&z=.tsv

An alternative is to make an index of named graph projections of all typed resources in some set of managed ontologies in a database repository and use the repository as the index.

Resource catalogs can be associated to the ontologies they project using aliases derived from the names of the repositories, say idx:foo .

Given a suitable index database, it remains to automate finding hits from the index, and using the result set and the associated repos to generate entries with DESCRIBE queries. The solution is likely to involve boilerplate queries that take the hit list and the associated repositories as parameters. Here is one possible indexing setup.

  1. collect term/designation definition (rdfs:isdefinedby) and/or hit (rdfs:seeAlso) indexes over relevant TF repositories in an owlim database dedicated for that purpose.
  2. devise a query which looks up a string in one of the above indexes and runs the relevant describe queries on the hits, using as repos the indexed ontologies. The query could take the form
    query -U INPUT -i {query URIs from index for regex}.tsv -p={query repos per item}.tsv {shared repos}

The following command creates an index for the native ontologies listed in ont-policy.rdf (i.e. those whose schema is the TF Schema). The listing is obtained with a query that reads the names of the relevant ontologies from ont-policy, and the iterated (boilerplate) query forms an index out of each and loads it to the resource catalog index .

pellet4tf query -F -i "http://localhost/TermFactory/query?q=home:/etc/scripts/select-tfs-ontologies-from-ont-policy.sparql&f2=TSV&z=.tsv" -q sparql/construct-typed-resources-from-url-by-url.sparql --store=index

The following commands list designations whose base form contains 'term' from an index database and graphs containing such designations in the database, respectively:

pellet4tf query -F -q etc/scripts/select-designations-by-iri-r&f2=TSV" -i -.*term --store=index pellet4tf query -F -q etc/scripts/select-graphs-for-designations-by-iri-r&f2=TSV" -i -.*term --store=index

These queries can also be packed to query strings using the pellet4tf query --pack (-P) option and given to a browser:

http://localhost/TermFactory/query?i=-.*term&q=home:/etc/scripts/select-designations-by-iri-r.sparql&r=index%2b&f2=TSV&z=.tsv http://localhost/TermFactory/query?i=-.*term&q=home:/etc/scripts/select-graphs-for-designations-by-iri-r.sparql&r=index%2b&f2=TSV&z=.tsv

For more convenience, the query aliases below take these queries down to idx:exp1:-.*term.tsv and idx:exp2:-.*term.tsv respectively.

[ rdf:type tf:Alias ; tf:mapping [ tf:pattern "idx:exp1:(.*).tsv" ; tf:altPattern "http://localhost/TermFactory/query?i=$1&q=home:/etc/scripts/select-designations-by-iri-r.sparql&r=index%2b&f2=TSV&z=.tsv" ] ] . [ rdf:type tf:Alias ; tf:mapping [ tf:pattern "idx:exp2:(.*).tsv" ; tf:altPattern "http://localhost/TermFactory/query?i=$1&q=home:/etc/scripts/select-graphs-for-designations-by-iri-r.sparql&r=index%2b&f2=TSV&z=.tsv" ] ] .

The following command uses the above aliases to generate HTML entries for those designations in the indexed ontologies which contain the string 'term'.

pellet4tf query -f HTML -T exp -U idx:exp1:-.*term.tsv idx:exp2:-.*term.tsv

The above query packed to a query string with option --pack is this:

http://localhost/TermFactory/query?D=idx%3Aexp1%3A-.*term.tsv&r=idx%3Aexp2%3A-.*term.tsv&f=HTML&a=1

Finally, the whole index query workflow is wrapped into one alias:

[ rdf:type tf:Alias ; tf:mapping [ tf:pattern "idx:exp:(.*).html" ; tf:altPattern "http://localhost/TermFactory/query?D=idx%3Aexp1%3A$1.tsv&r=idx%3Aexp2%3A$1.tsv&f=HTML&a=1" ] ] .

With the above abbreviations in place, an entry for words known to index database index whose IRI matches string pattern X is accessible at the TF address idx:exp:X.html .

Recall the problem of ontology hell for TF: how to to avoid inadvertent replication of work when creating new TF terms. The descriptive naming convention helps, but we still want ways to harvest the true and false friends of terms. Boilerplate queries over resource catalogs or other databases help automating the search of similar terms, and query aliases help hide the ugly details. It remains to automatise the indexing, so that the indees stay up to date as the ontologies change. The automatisation is likely to involve similar solutions, as well as remote location mappings, outlined in the next section on revision control.

Revision control

Revision control (also known as version control, source control or (source) code management (SCM)) is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may be changing the same files. Changes are usually identified by a number or letter code, termed the "revision number", "revision level", or simply "revision". For example, an initial set of files is "revision 1". When the first change is made, the resulting set is "revision 2", and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged.

Version control systems (VCS) are most commonly stand-alone applications, but revision control is also embedded in various types of software like word processors (e.g. Microsoft Word, OpenOffice.org Writer, KOffice, Pages, Google Docs), spreadsheets (e.g. OpenOffice.org Calc, Google Spreadsheets, Microsoft Excel), and in various content management systems. Integrated revision control is a key feature of wiki software packages such as MediaWiki, DokuWiki, TWiki, etc. In wikis, revision control allows for the ability to revert a page to a previous revision, which is critical for allowing editors to track each other's edits, correct mistakes, and defend public wikis against vandalism and spam.

Term management does not call for real time databases with shared access, transactions, record level locking, and suchlikd. Terminology management is a slower process of collaborative editing on something like a wiki platform (which of course can be backed up or under revision control). Once approved, terminology is moved to a read only section of the database, where it gets updated through batch updates. Ontologies as web documents can have and surely need both version information and version control. Version information on TF ontologies can be provided using OWL annotation properties . Beyond that, TF revision control relies on pre-existing repository specific tools for versioning document, memory, and database representations of ontologies. For managing OWL annotations, see Version checking

OWL revision control literature contains many proposals to log ontology change history explicitly and in full detail as OWL annotations or meta ontologies ( Redmond et al. ). Versioning systems like parsia.com Redmond et al. paper do not use a syntactic or semantic approach, but a pragmatic one, as relations between versions are defined by a history of user actions. An item is related to its predecessors through a history of editing change by a user. We avoid going down this route. In our view, ontology versions should be compared on their current merits, whatever their editing history. Instead, we are thinking in terms of two extremes:

  1. syntactic: standard serialisation of ontologies plus existing third party source revision control systems like svn.
  2. semantic: a facility to run semantic diffs using a reasoner. Horrock et al. (ref.) show how to test entailment in OWL by doing tabular proof on a source ontology and a complement of a target ontology.

If the tableau construction succeeds, the complement of the closure contains facts not entailed by the source. The semantic diff of two ontologies is their symmetric difference (exclusive or). The first extreme documents editing changes, the second extreme semantic changes.

Ontology tools like editors allow the user to manage working versions of the ontology under offline development. Persistent ontology revision control, which allows retrieving and comparing to older versions of the same ontology, is best handled with dedicated version control systems outside of the ontology proper. Databases (like mySQL ) there are ways to back up and recover (restore) snapshots of the database content either as a whole or incrementally (see HowtoForge ).

Web content management platforms and Wikis (like MediaWiki ) have their own version control solutions. (Cf. a discussion of version control for web content. .

The TF project software and ontologies are kept under the Subversion revision control system. Subversion has extensions for revision management through websites or local filesystems (e.g. Tortoise SVN for Windows, eSVN for Linux).

Scope of version information

The OWL standard documentation mainly considers version annotation per ontology, but notes that the owl:versionInfo property can be used to annotate classes and properties as well as ontologies. With TF versioning a question of grain of version annotation arises (cf. source indications ). Versioning per ontology may be too coarse, while versioning on every triple is too fine. Some conventions for annotation property assignment and property inheritance are called for, with rules for conflict resolution for when a triple inherits more than one source or version indication. In XML based entries like TBX, administrative information is scoped by the term entry tree. Semantic Web reasoners enable stating property inheritance using some rule language. Two main approaches present themselves for TF.

  • scoping by query. An entry query retrieves annotation triples relevant for the entry. The scope of the triples is implicitly defined by the structure of the entry.
  • inheritance by rule. Annotations for a triple are inferred by a reasoner using explicit rules .

The approaches can be combined. An entry query collects the annotations, and rules distribute them to the component triples. The conventions, as well as needs, may vary here from one ontology to another.

Normal forms

This section considers normal forms for TF ontology documents. A normal form is a unique choice among equivalent representations. Reduction to normal form by term rewriting is what many reasoners in effect do. Tabular reasoners virtually reduce a model to disjunctive normal form (DNF). Resolution reasoners reduce to conjunctive normal form (CNF). Normal forms for many different logics exist, which can be made unique by sorting terms. Reduction to normal form reduces equivalence comparison to textual identity. More generally, normalization converts semantic reasoning to syntactic processing. Runtime is saved at the cost of offline compilation and storage.

Carroll/Stickler 2004 is an early proposal for an xml normal form for rdf triples. Dau 2006 discusses normal forms for rdf graphs. He defines two extreme normal forms for rdf graphs as concerns the proportion of nodes to triples. One extreme is a normal form where each fact is represented as a separate triple. The other extreme is a representation where every node appears just once. The duality is similar to that between nested tree (graph or matrix) and nonnested (path or mrs) representations of feature structures or Turtle files.

What about OWL? OWL ontologies are theories in DL, which has a a sizable theory of equivalence. There are also a number of proposed CNF normal forms for some description logics ( Hitzler/Eberhart 2007 , Bienvenu 2008 ).

The most promising semantic normal form for TF may be the prime implicant normal form or its dual, the prime implicate normal form. The prime implicant normal form is basically a pruned syntactic version of a tableau (Hintikka model set) construction. (Dually, the prime implicate normal form is a pruned clausal normal form.) Bienvenu 2008 defines PINF for the modal logic K, but it should be extensible to other modal logics with finite model property (Bull and Segerberg ref), including KB, KT and K4. The construction is not efficient (possibly exponential in time and space), but once in (suitably sorted) PINF, two ontologies can be compared syntactically for semantic differences.

Entry normal form

The TF schema allows many equivalent traversals of the model graph. For one thing, every property has an inverse. This redundancy can be exploited in conversion. A conversion script can choose which way to traverse a source document to produce a sufficient set of axioms to generate the rest. A related notion of normal form, analogous to relational database normal forms, is provided by TFS profiles .

Another such normal form is the form produced by the TF HTML writer to display entries. The TF entry writer templates decide the traversal, and a schema can bridge third party content to TFS. The current TF2HTML writer uses the Pellet reasoner online to generate the traversal, but for efficiency, only does type level (TBox) reasoning on the schema model. For faster production of entries from an ontology, it would be better not to have to call a general purpose reasoner at runtime to fill out missing structure. Reasoners are too unfocused and slow for the purpose. Instead, we may write a special purpose reasoning process to apply offline to normalise an ontology for term extraction. In practice, such special purpose reasoning can also be carried out by sparql scripts. Scripts etc/scripts/invp.sparql and etc/scripts/dirp.sparql generate from an ontology inverse property triples corresponding to direct property triples in it, and vice versa. See also factor utility option --direct .

TF3 normal form

For syntactic revision control, it seems useful to have a textual normal form for a TF ontology that can be maintained by existing text-based versioning systems. The idea is to use svn or some such tool to run diffs on triple files. Knowing the differences, they can be visualised in a graph representation of the ontology (say, by colouring).

There is no agreed unique serialisation of RDF or OWL that would allow reliable textual comparison of two ontology documents. OWL APIs and editors based on them do not in general guarantee constant printout either, probably because the serialisation depends on implementation and runtime dependent factors (like hashmaps). (Note: more recent editors show progress on this count. ) TF undertakes to produce at least some normal form. The simplest one may be a sorted triple file. To obtain a unique XML/RDF serialisation, a graph traversal order needs to be fixed, plus the grain of the units to compare. A fine grain still readable to human eye should be preferred for debugging.

We start from the triple representation of an ontology, sort and rewrite it into a unique normal form. Sorting triples is not enough, we need some way to compare blank nodes between versions. Ontology read/write routines normally factor blanks in order to avoid accidental capture of anonymous variables across ontologies. In versioning, we want to match blank nodes between versions if possible. (Jena blank node ID's are long hex numbers (uuid's) which have a good chance of staying distinct but they have not got the persistence guarantee of uris.)

We first considered having a Jena triple reader/writer pair that writes sorted triple files without renaming blank nodes. That does not help much, as long as most tools (justifiably) don't respect blank identity. Identifying blank nodes between versions goes against the grain of RDF for blanks are supposed to be anonymous. Adding each blank a tf:nodeID property (as in [ tf:nodeID "1" ] might work in principle, but gives a messy look. Also some ontology tools may not handle well such extra labels. A third approach was to invent some canonical numbering of blanks. To test the idea, we first wrote a text based script which parses a triple file, sorts it so that blanks are treated as equal, and then factors the blanks in the sort order with running number. The experimental script sort.perl produced identical results from two triple file printouts of TFS.owl

Carroll [Car2003] presents an algorithm for generating a canonical names for blank nodes in order to obtain a canonical ordering of the triples of a (possibly slightly modified) RDF graph. TF3 uses a variant of the same idea. The TF3 format is a special case of the N-TRIPLES (NT) format where blank IDs are standardised using a variant of Carroll's idea and the lines are sorted alphabetically. The newline character in multiline literals is escaped as "\n".

The TF3 (TF triple) normal form of a TF ontology aims to minimise free variation between versions, so that standard textual diffs and their visualisations can point out the different triples. With TF3 we can pinpoint version changes made to an ontology (on the RDF graph level) textually using file comparison utilities like diff . The differences can now also be visualised graphically with the homespun RDF graph visualizer TFVisu .

The current TF3 writer encodes blanks by their first occurrence in a standard sort of the rdf model. The blank node ID is an encoded copy of the triple where the blank first occurs. The encoded triples are output in alphabetic order. This normal form is close to the RDF triple representation of an OWL ontology, which can be a plus and a minus. See the factor utility for details. TF3 is a special case of Turtle format, so that TF3 files can be read by a Turtle reader.

Since RDF equivalence (graph isomorphism) is NP complete, producing a normal form where equivalence test is linear must be NP complete. The hard cases involve graphs with blank triples. Identical TF3 form is sufficient (sound) but not necessary (complete) test for equivalence. For typical TF terminology test cases, writing an ontology into a TF3 file and back produces the same identical TF3 file.

TF model diff

TermFactory uses RDF reification to losslessly compress an original RDF model and its edited version int one model. The TF model diff is also an RDF model that lists the triples common to the original and the edits (the commons) as such, and reifies the symmetric difference to a set of reified statements (quads) for those triples that belong to the original but not the edits (the deletes) and another set of quads for those triples that belong to the edits but not the original (the adds). Though a diff is just another RDF model, it has a special semantics for TF editing. The semantics can be expressed with the following algebraic notation. Infix plus and minus stand for model add/remove (triple set join/union), * stands for model meet (triple set intersection), prefix plus/minus stand for reification as added/deleted quads, |.| for dereification of quads into plain triples.

diff == -deleted+common+(+added) diff model consists of deleted quads, common triples, and added quads
added = |+added| added triples are reified as quads
deleted = |-deleted| deleted triples are reified as quads
+added1+(+added2) = +(added1+added2) reification and model booleans commute
-deleted1+(-deleted2) = -(deleted1+deleted2) ditto
orig(diff) == deleted+common original consists of deleted and common triples
edits(diff) == common+added edits consist of common and added triples
added(diff) = added
deleted(diff) = deleted
common(diff) = common
orig(orig) = orig plain original
edits(edits) = edits plain edits
diff(orig,edits) == -(orig-edits) + edits*orig + +(edits-orig) diff construction from original and edits
diff(diff1,diff2) = diff(orig(diff1),edits(diff2)) diff of diffs
apply(active,diff) == active-deleted+added diff is applied by removing deletes and adding adds
compose(diff1,diff2) == diff(orig(diff1),apply(edits(diff1),diff2)) compose defined with apply
apply(active,compose(diff1,diff2)) == apply(apply(active,diff1),diff2)
edit(active,orig,edits) == active + edits-orig - orig-edits == apply(active,diff(orig,edits)) edit operation defined with apply
apply(active,diff) == edit(active,orig(diff),edits(diff)) == edit(active,orig(diff),diff) apply operation defined with edit
inv(diff) == diff(edits(diff),orig(diff)) inverse diff
apply(apply(active,diff),inv(diff)) == active inverse diff undoes diff
diff(null,edits) == edits-null + edits*null - null-edits == +edits
diff(orig,null) == null-orig + edits*null - orig-null == -orig
diff(edits,edits) == null + edits - null == edits unchanged
diff(diff,diff) == diff(orig(diff),edits(diff)) = diff
diff(diff,inv(diff)) == diff(orig(diff),orig(diff)) = orig(diff)

TF entry diff

The TF edit utility and the HTML entry writer provide a way of displaying version differences. The following command writes an HTML entry which displays the differences of two entries. The versions are merged and displayed as an HTML entry with the deleted (original) content stricken out and the new (active) content underlined.

tfedit ChinaOld.html ChinaNew.html --diff > ChinaDiff.html
Show/hide TF entry diff

The diff ChinaDiff.html

The original ChinaOld.html

The edits ChinaNew.html

The following command roundtrips a HTML diff document through the HTML writer. The HTML diff entry contains both the original and the edits, so the first parameter of the diff operator is left empty. What the command does is extract the original and the edits from the diff and feed them to a new diff write.

tfedit -F - ChinaDiff.html --diff

The diff operator is equivalent to a longer sequence of operations shown in the following command line. Here, the tfedit add operator merges the original entry for China (ChinaOld.owl) with the edited version (ChinaNew.owl). The HTML entry writer's original and edits options are used to supply the original and the edited ontologies to the HTML writer to mark the differences. The --deleted=true option tells the writer to include the deleted triples to the diff.

tfedit -F --format=HTML --deleted=true --original=ChinaOld.owl --edits=ChinaNew.owl ChinaOld.owl ChinaNew.owl --add > ChinaDiff.html

The HTML read flags deleted , and edited filter in or out from an HTML diff the triples tagged with the corresponding html class tags. The following commands retrieve the above diff, the original, and the edits, respectively, starting from the diff. The default value of --deleted is false and --added is true, so the default shown from a diff is the edits.

tfget -F --deleted=true --format=HTML ChinaDiff.html tfget -F --deleted=true --added=false --format=HTML ChinaDiff.html tfget -F --format=HTML ChinaDiff.html

The HTML read flags deleted , and edited filter in or out from an HTML diff the triples tagged with the corresponding html class tags. The following commands retrieve the above diff, the original, and the edits, respectively, starting from the diff.

tfedit -F - ChinaDiff.html --diff > diff2.html

Entry write date and user

The HTML writer records the date and user of the write of a TF document in a meta element in the header of the document in the XML date format . This information is preserved in conversion of the document to other TF formats as datatype properties meta:date and meta:user .

Version checking

TF repositories may mirror or cache copies of ontology models or entries obtained from other repositories. This raises the question how downstream servers know when an ontology has changed. For named documents, the version can be checked by a remote location mapping query. For query results, a query can collect owl:versionInfo triples which are compared with the home site's ontology policy. Implementation not completed.

Location mapping queries

To find out about ontology version changes, a site can query another site's location mappings. A location mapping query works as follows. The home server maps the public persistent url http://tfs.cc/owl/TFS.owl in its ont-policy file to the latest official versioned url, say http://tfs.cc/owl/2.0/TFS.owl . The mirror server maps the persistent url to a remote location map query from the home server. The returned url is mapped to a local version in the mirror if the mirror has it. If not, the newer home server version is fetched and perhaps cached locally. Compare conventions in the OWL 2 recommendation .

A query (for instance a TF DESCRIBE query) can also consult several ontologies and cache the result. If any one of the consulted ontologies changes, the cached result becomes stale. One way of solving this last problem is for TF DESCRIBE queries to collect owl:versionInfo triples from each consulted ontology with the cached model to know which ontology versions it has imported, and compare that list to version info stored in the ont-policy files of the imported ontologies in the appropriate TF server(s). The following details how this might happen.

The owl:versionInfo element identifies the current version of the public URI on this server. For now, it does not matter just what the versionInfo element says; the current checker only tests for string equality. So far, there is no provision for there being more than one versionInfo element per URI, or more than one ontology specification for a given uri in the ont-policy file. The default TF DESCRIBE queries are set up to collect versionInfo elements to the query result. A cached entry might begin as follows:

<rdf:RDF xml:base="http://tfs.cc/owl/TFS.owl" > <owl:Ontology rdf:about=""> <owl:versionInfo rdf:datatype="&xsd;string">version 0.0 31.10.2008</owl:versionInfo> </owl:Ontology> ...

When TF option TF_VERSIONINFO is true, the TF tfget facility compares the owl:versionInfo triples found in the database model to those specified in ont-policy.rdf . If some URI occurs both in the cached model and in the ont-policy file associated to different owl:versionInfo values, the query engine rejects the cached entry and makes a new DESCRIBE query.

To allow cross-server versionInfo checks, (a public version of) the ont-policy file of each server should be publicly accessible from the web at home:etc/ont-policy.rdf .

TF tools

TF terminologies are maintained with the help of various user interfaces, services and tools. These include TermFactory specific tools, collaborative terminology platforms with TF plugins, and third party professional ontology tools. The offline tools are described in the user manual.

TF tools

TF utilities

The TermFactory specific commandline utilities include

tfget
locates ontologies
tfquery
does ontology queries
tfedit
does ontology editing
tfcopy
copies ontologies across locations
tfactor
renames resources and reformats documents
pellet4tf
pellet reasoner for TF

Get utility

The tfget utility helps use and test a TF site's location mapping policy.

TermFactory name and address aliases can be arbitrary strings: just foo or this is a test are good TF aliases. TF aliases are resolved by the TF tfget utility.

Although any string is good for a TF address, RDF applications may expect standard compliant IRIs. The minimum requirement for an IRI (or URN) is that it contains a scheme part, i.e. the scheme separator colon : occurs in it.

The java class for applying TF aliases is com.grapson.tf.rev.jena.Get.java . io/bin/tfget is a command line wrapper script for it.

Without arguments, or with the --help switch, tfget shows the command line usage:

Usage: Usage: --help(=options|level|format|style) (-h) --map=ADDRESS --imports (-i) --try(=ADDRESS) (-t) | ADDRESS

Format options are shown with the following command:

tfget --help=format --format= GF HTML JSONLD N3 NT (N-Triples) Properties RDF/XML RDF/XML-ABBREV TBX TURTLE tfget --help=HTML --format=HTML --template=ADDRESS --root=<NAME> --schema=ADDRESS --active=ADDRESS --original=ADDRESS --edits=ADDRESS --lion=ADDRESS --lang=<ISO langcode> --lynx=ADDRESS --skin=ADDRESS --tree --(readonly|deleted|added|blanks)=(true|false)

Often the slowest part of a query is the initialization. With the --keep=true (-k) option, tfget loops executing with the same engine instance against different commandline arguments until end of file (^D on Linux), or --keep=false. tfget only reads one source address per round. Toggles invert value at each successive read unless set explicitly to true or false, flags take no value and can only be turned on.

The tfget utility's commandline options are explained below. The same options apply throughout TF unless explicitly overridden.

Specific options:

option short form description values
--imports -i list imports of ontologies at given address flag
--try -t resolve (TF location map) address/es flag
--untry -y unresolve (TF location map) address/es flag

Gate options:

option short form description values
--encoding character encoding UTF-8 (default), UTF-16
--pass credentials for TF access string
--conf -C address of a TF properties file for a user conf address
--user credentials for TF access string

Main options:

option short form description values
--conv (none) apply named converter on download Java class name
--factor -W apply a resource renaming operation flag or picklist (see --help=options)
--file -F resolve relative filename arguments on commandline to working directory toggle
--format -f output format picklist (see --help=format)
--glob -g output global names (full URL/IRI) toggle
--help -h get help about options options|level|format|style
--imports -i list imports of ontologies at given address flag
--job -w run command in background (results in download address) toggle
--keep -k loop command (keep session values) toggle
--level -v set logging level picklist (see --help=options)
--map use given location map address
--notry -n no location mapping (default false) toggle
--out -o copy/upload tfget output to given address. address
--over -x overwrite target address in copy/upload. toggle
--pack -P construct TF query url out of command line (instead of executing it) toggle
--prefixes -p model to copy known prefixes from address
--readAll -a read ontologies with imports toggle
--time -Z toggle execution timing. toggle
--try -t resolve (TF location map) address/es flag
--untry -y unresolve (TF location map) address/es flag
--writeAll -b write ontologies with imports toggle

HTML options:

option short form description values
--active -A active ontology (to which changes get committed) address
--added include added triples when reading HTML diff into RDF. toggle
--blanks Convert blanks to skolem constants or back. toggle
--deleted include deleted triples when reading HTML diff into RDF. toggle
--edits -E version after editing (for showing differencs) address
--lang -l localization language ISO langcode
--lion -L localization vocabulary address
--lynx -H hyperlink map address
--original -O version before editing (for showing differencs) address
--readonly include readonly triples when reading HTML diff into RDF toggle
--root -R list of instances/classes to include as roots in entry address
--schema -S schema to map user vocabulary to HTML template address
--template -T RDF template to define HTML entry structure
--tree suppress property cycles in HTML layout toggle

The location mapping task --try ( -t in short) tests location mappings. tfget --try foo shows where foo gets mapped by the TF location mapper in use. --untry inversely tries to find an alias that resolves to a given address. Only name and prefix aliases are untried, pattern aliases are just skipped. The option --file=true (-F) tells a command line utility to resolve schemeless source file addresses against the current working directory before passing them along. Another --file=false switch toggles this behavior off. tfget $TF_HOME/etc resolves the same as tfget home:/etc , except that the former applies the commandline user's file permissions, while the latter applies TF file access control.

The imports task --imports (-i) finds and lists the ontologies imported by the given url. tfget --imports foo loads foo with imports and extracts the names of all the ontologies imported by it.

Usage examples:

tfget del tdb http://tfs.cc/owl/TFS.owl remove TFS.owl from tdb triple database
tfget del .\* remove all from tdb triple database (note escape to save star from shell)
tfget list tdb home:/\w{8}[-]\w{4}[-]\w{4}[-]\w{4}[-]\w{12} list named graphs in tdb that match the UUID format
tfget http://tfs.cc/owl/TFS.owl --format=TURTLE locate TFS schema ontology and write it in TURTLE format
tfget --try tfs find what alias tfs stands for
tfget --untry http://tfs.cc/owl/TFS.owl find alias that stands for http://tfs.cc/owl/TFS.owl
tfget --try --map=wnlmap.tsv wn30:synset-1530s-noun-1 find where wn30:synset-1530s-noun-1 gets mapped using location mapping config files in listing wnlmap.tsv
tfget lion.json --schema=TFS.owl --format=TURTLE --factor=relabel convert a TF localization json file back to a TF ontology relabeling blanks with descriptive identifiers

If one document is requested, the document's mimetype and file extension (if any) do not conflict with required format (if any), and factors or imports are not required, tfget tries to download the document as is. This case covers any kind of file, not just rdf models. A factor can be forced for RDF content with the --factor=true (-W) option.

Contrariwise, whenever multiple documents, factors or imports are requested, or the the mimetype or file extension of the document conflicts with the required format, tfget tries to read the document/s into a RDF model.

In addition to TF alias, other agents such as the web server (apache2 for instance) or the web application container (Tomcat for instance) can do their own URL rewriting.

Show/hide TF addressing

TF url redirection

Factor utility

The java class com.grapson.tf.rev.jena.Factor.java rewrites TF ontologies in different formats and factors TF resource URIs. It reads an ontology file and rewrites it, adding labels and rewriting URIs. There is a shell script wrapper to Factor.java in io/bin/factor .

The query and edit utilities (and services) can apply certain labeled factor operations. Currently, the labeled operations are deblank , reblank , and relabel , which remove and restore blank nodes and create descriptive labels for anonymous TF resources, respectively.

The tfactor utility helps satisfy TermFactory's different and sometimes conflicting resource naming preferences.

Minimally, tfactor reads one or more TF files and prints a TF model in different formats. When there are more than one input file, it merges the contents of the files into one model.

tfactor --help(=options|level|format|style) (-h) --factor=blanks|format|identify|keys|names|quads|skolems (-W) --blanks | --id(=INTEGER) | --label(=PROPERTY) | --sameAs --add --create --remove --replace --from=NAMESPACE --to=NAMESPACE | --text --uuid --tf3decode --urldecode --iriencode --urlencode --tf3encode --normalize --direct --trim ADDRESS...

The main mission of tfactor is to systematically change resource names in various ways. The main task types are

--label(=<URI>) | --blanks | --text

The label task creates or removes alternative labels for resources and/or replaces resource URIs with alternative labels. The blanks task removes and restores blanks. The text task parses Turtle text files (without constructing the RDF model) and encodes resource names occurring in the triples using one of the encoding options. The tasks are modified by option switches as explained below.

option short form description values
--blanks factor anonymous resources (blank nodes). flag
--create create label from keys flag
--factor -W apply a resource renaming operation flag or picklist (see --help=options)
--id use id number integer offset (default 0)
--iridecode remove percent encoding for iri reserved characters flag
--iriencode percent encode iri reserved characters in label flag
--label use descriptive label property IRI (default rdfs:label)
--normalize normalize label to unicode normal form C flag
--remove remove old label flag
--replace replace old label flag
--tf3encode escape turtle reserved characters in label flag
--urldecode remove percent encoding from label flag
--urlencode percent encode url reserved characters in label flag
--uuid convert label to uuid flag
--direct replace inverse properties with the corresponding direct ones. flag
--indent prettyprint with given indent integer
--trim remove unused namespace prefixes flag

Usage examples:

--create --label , tells tfactor to form descriptive names (IRIs) for expressions and terms on the basis of their key properties, as described in the section on TF descriptive resource names. With --create --id , tfactor creates sequential numerical ids. The start count can be given as in --id=100 . Then the first new id will be _101. Option --remove removes alternative labels from the affected instances. Option --replace actually replaces URIs of the affected resources. If --create is chosen. the newly created label is used. Otherwise the new URI is the (first best) existing alternative label of the resource. The replaced URI is saved as an alternative label on the factord instance, unless --remove is specified.

For designations and terms, a descriptive URI is generated only when the entity has the requisite property values (langCode, catCode, and baseForm or romanisation in case of designations; hasDesignation, hasReferent in the case of terms) and the namespace prefixes for the designation and referent namespaces are known.

The --blanks task factors blank nodes into temporary URIs with tfactor --remove --blanks and back with tfactor --replace --blanks . What happens is anonymous nodes are replaced with URIs in the namespace urn:blank , and conversely URIs in this namespace are replaced with blank ids.

Temporary URIs are exempt from special treatment given to blank nodes by RDF i/o and reasoning, which simplifies editing and other syntactic operations on blank triples. Compare Jena ARQblank node labelsextension.

Option --from= gives a prefix matched with URIs to form input to tfactor. If not given, the input is taken to be the blanks (anonymous resources) in the model. If option --to= is given, it is taken to be the namespace of the new URIs that get created. The default is the instance's original namespace, if it has one. If the instance is anonymous, the default namespace of the model (value of empty prefix) is used.. If it is not defined, the value of configuration property TF_NAMESPACE is used.

If a source is a hyphen (-) , tfactor reads the source from standard input. This allows using tfactor as a filter (e.g. cat test | tfactor - converts test from tf3 into rdf/xml).

Some usage examples follow. The following command line provides descriptive URIs for blank node expressions and terms:

tfactor --create --label --replace source > target

The following command removes url encoding and then converts to iri encoding. The same command first removes one encoding and then introduces another one.

tfactor -F --replace --label --remove --urldecode --iriencode icd10de.ttl.uri > icd10de.ttl

The following command prints to standard output a relabeled version of TFS.owl in TermFactory triple format with entity URI's URL-encoded.

tfactor -F --replace --label --urlencode TFS.owl

The following command line rewrites a turtle TF3 encoded ontology into RDF/XML decoding the TF3 encoded items.

tfactor -F puls-locations.ttl --replace --label --tf3decode > puls-locations.owl

Command tfactor -F --format=TURTLE --label --uuid --replace --in=http://tfs.cc ../owl/en-TFS.owl converts tfs.cc URIs into uuids saving the replaced URIs in meta:label.

Command tfactor -F --format=TURTLE --label --create --id --replace --in=http://tfs.cc ../owl/en-TFS.owl creates running numbers for tfs.cc URIs saving the replaced URIs in meta:label.

Command tfactor -F --ttl --tf3encode source.tf3 encodes percent as u00 all reserved characters in a triple file. This may avoid character problems in conversion.

Command tfactor -F --ttl --normalize source.owl normalizes a file into unicode normal form C. It may be necessary to normalize incoming unicode for jena.

The --direct option changes the direction of properties in a TF ontology so as to minimize the need to use a reasoner.

--factor=identify applies sameAs axioms to merge identical nodes. --factor=quads rewrites the merge of contexted models grouping together triples that share context.

When an ontology is properly directed, a plain SPARQL query written in terms of direct properties is sufficient to query it. Other types of variation may also need to be reduced. For instance, language and part of speech identifications may need to be normalized to language and category codes. This can be done with Mixed engine pellet4tf queries like etc/scripts/langcodes.sparql and etc/scripts/catcodes.sparql .

Query utility

The TermFactory TFQuery query utility is originally based on Pellet, but extends it in many ways. It has more query types; in addition to plain query and reasoning, it can fetch documents, describe resources using DESCRIBE queries, and apply a query to a given list of inputs. TFQuery loads data using a TermFactory specific ontology loader which consults the TF location mapper to resolve addresses.

tfquery
tfquery Usage: --help(=options|level|format|style) (-h) --bridge=ADDRESS (-B) --depth=INTEGER (-d) --describe-query=ADDRESS (-G) --display-query=true|false --engine=Pellet|ARQ|Mixed|SPARQL|Stacked (-e) --input=TEXT (-i) --interpretation=TEXT (-I) --joiner=REGEX (-j) --limit=INTEGER (-N) --merge=true|false (-m) --offset=INTEGER (-M) --cross=true|false (-X) --pack=true|false (-P) --splitter=REGEX (-J) --describe=NAME... (-D) ... | --queryLine=ADDRESS (-q) | --queryText=TEXT (-Q) | --try=ADDRESS... (-t) | --untry=ADDRESS (-y)--uri=NAME... (-U) ... | --url=ADDRESS... (-u) ... --verbatim=TEXT (-V) --repo=REPO (-r) ... REPO ...

Query utility parameters over and above those listed under tfget utility are described below. Options that accept multiple names or addresses accept joiner separated lists of items, and multiple instances of such options can be given. Query parameter names 'uri' and 'url' are misnomers: TF resource names and addresses can be aliases resolvable to IRIs or URLs by TF location mappings.

Queries can be specified using the queryLine and queryType parameters alone. Then the arguments of the query are given as queryLines and the type specified by queryType. Usually it is more convenient to use query type specific options.

Specific options:

option short form description values
--bridge -B bridge schema for service or Stacked engine TF address
--depth -d description depth integer (negative means unlimited, default -1)
--describe -D describe resource/s joiner-separated list of resource names
--describe-query -G query to use in describing resources TF address
--display-query print query text to standard error (commandline) TF address
--echo -V reformat contents RDF text
--engine -e query engine SPARQL, ARQ, PELLET, Mixed, Stacked
--input -i input/s to boilerplate query joiner/splitter-separated list or table
--interpretation -I assignment of substitution parameters to inputs in boilerplate query string
--joiner -j regex to split inputs by regular expression
--limit -N max size of query result (rows/triples) integer
--merge -m merge boilerplate query results together toggle
--offset -M number of query results to skip from beginning integer
--cross -X cross multiply parameter values integer
--queryLine -q depends on queryType. Address of SPARQL query by default joiner-separated list of names or addresses
--queryText -Q text of a SPARQL query URL encoded string
--queryType type of query DESCRIBE, ECHO, TRY, URL, URI, QUERY (default)
--repo -r repository address/es joiner-separated list of addresses
--try -t address to resolve TF address
--untry -y address to unresolve TF address
--uri -U resource name/s to describe joiner-separated list of resource names
--url -u url/s to fetch joiner-separated list of addresses

tfquery extends the pellet query facility in many ways. It has more query types; in addition to plain pellet, it can fetch documents, describe resources using DESCRIBE queries, and apply a query to a given list of inputs. pellet4tf loads ontologies using a TermFactory specific ontology loader which consults the TF location mapper to resolve addresses.

Query type

  • A DESCRIBE query describes one or more items with the given or default DESCRIBE query from given repositories.
  • A ECHO query reads given RDF contents as text and prints the contents out in requested format.
  • A TRY query resolves the address of a document to a location URL using location mappings.
  • A URL query fetches the contents of a file of the given name using location mappings.
  • A URI query describes one or more items with the default DESCRIBE query from given repositories. Sames as DESCRIBE with HTML root set to the items described.
  • A default (untyped) query executes a given SPARQL query against given repositories.
TF query engines

The differences of the various query engines are described in the pellet documentation like this:

  • ARQ
    • ARQ handles the query execution
    • Calls Pellet with single triple queries
    • Supports all SPARQL constructs
    • Does not support OWL expressions
  • Pellet
    • Pellet handles the query execution
    • Supports only Basic Graph Patterns
    • Supports OWL expressions
  • Mixed
    • ARQ handles SPARQL algebra, Pellet handles
    • Basic Graph Patterns
    • Supports all OWL and SPARQL constructs

tfquery adds two more engines: a plain SPARQL engine from Jena, and a Stacked engine. The TF plain SPARQL engine differs from the original Jena ARQ sparql engine in that it uses the TF ontology loader. Jena ARQ, As a RDF tool, does not do ontology imports. The SPARQL engine builds a dataset out of the repository sources.

If the resolved source item starts with the keystring --named= , the item (minus the keystring) is loaded into the dataset as a named graph. Otherwise, it is included in the dataset's default graph. This also applies to listings: if a listing is has a name like --named=foo.lst , all items in it are loaded as named graphs.

If the resolved source item is a store description, a dataset or named model is assembled from the specs depending on if the description contains a --name item.

If the resolved source item starts with --endpoint= , the source is taken to be a sparql service endpoint, and the query is forwarded to it. The answer from the endpoint is postprocessed using given TF settings like output formatting.

For instance, querying the multipart ontology epi.owl as such with sparql only queries the importing ontology. To include imported ontologies in the dataset, they must be listed explicitly.

sparql --data=owl/epi/epi.owl --query=etc/scripts/triples.sparql ------------------------------------------------------------------------------------------- | s | p | o | =========================================================================================== | <http://tfs.cc/owl/epi/epi.owl> | owl:imports | <http://tfs.cc/owl/epi/puls-bridge.owl> | | <http://tfs.cc/owl/epi/epi.owl> | owl:imports | <http://tfs.cc/owl/epi/bio-bridge.owl> | | <http://tfs.cc/owl/epi/epi.owl> | rdf:type | owl:Ontology | -------------------------------------------------------------------------------------------

In contrast, pellet4tf loads imports if option --include-imports is set, and ignores them if --ignore-imports is set. In TF, imports are off by default. This default is taken to avoid accidentally including imported ontologies into edits. The default can be changed with TF_READALL conf setting.

pellet4tf query -F -e SPARQL -q etc/scripts/triples.sparql owl/epi/epi.owl 2012-12-02 23:10:11,437 [main] INFO IO com.grapson.tf.que.pellet.TFQuery - Query Results (175629 answers) ...

The jena sparql utility expects the imports as a flat list on the commandline as sparql --data parameters. Such a list can be produced with TF tfget option --imports and saved as a TF listing :

tfget --imports -F ../owl/epi/epi.owl http://tfs.cc/owl/epi/puls-basic.owl http://tfs.cc/owl/epi/puls.owl http://tfs.cc/owl/epi/biot.owl http://tfs.cc/owl/epi/puls-all.owl http://tfs.cc/owl/TFS.owl http://tfs.cc/owl/epi/puls-bridge.owl http://tfs.cc/owl/epi/bioc.owl http://tfs.cc/owl/epi/bio.owl http://tfs.cc/owl/epi/bio-bridge.owl

The --describe or -U parameter allows shorthands like the following:

pellet4tf query -U exp:Language pellet4tf query -U "exp:Language exp:Language" pellet4tf query -U exp:Language,exp:Language -j "," pellet4tf query -U http://tfs.cc/ont/Country pellet4tf query -U "<http://tfs.cc/ont/Country>"

Prefixed names like exp:Language can be queried for prefixes defined in TF_PREFIX_URL or in the incoming data. Otherwise use full URI. Angle brackets around a URI are optional. If they are used, better quote the string to avoid commandline parsing errors. A list of inputs must be parsed by the shell as one commandline parameter. The list may need quotation marks around it, depending on the value of the --joiner (-j) parameter.

A Pellet or Mixed engine query on anything but small datasets is often quite slow, typically due to the inefficiency of realization (rdf:type inference). The TermFactory specific STACKED engine is an attempt to remedy this. The Stacked engine first makes a DESCRIBE query with Jena SPARQL engine for the arguments of the input query, then runs the original query with Pellet MIXED engine using as dataset the results of the first round. For instance, if the input query is a SELECT query, the pre-query replaces SELECT with DESCRIBE for the first round and applies the SELECT query to the result of the DESCRIBE query.

The first stage may use the more efficient SPARQL rdf query engine, or reason with a bridge ontology to select a set of asserted triples from a third party ontology. The second stage uses as input the selected triples plus the supplied schema ontology to run the slower Pellet MIXED engine query on this (hopefully smaller but still sufficient) dataset. If a bridge ontology is specified with property TF_BRIDGE or an bridge=... query string or --bridge=... command line parameter, it will be added to the dataset for the second round.

The Stacked engine is sound but not complete. The dataset extracted in the first round is not guaranteed to be safe, in the sense of pellet modularity . It may miss long distance entailments entailed by the original dataset but not by the extract made by the pre-query. For example, an instance base with statements :a :lt :b . :b :lt :c . and schema :lt a owl:TransitiveProperty . entails triple :a :lt :c . This statement will be included in DESCRIBE :a under the Mixed engine, but not under the Stacked engine, if :b :lt :c does not happen to get included in the pre-query DESCRIBE result. With unlimited DESCRIBE depth, Stacked engine is safe, but then the extracted model easily gets to be too big to allow any useful savings. The Stacked engine can be helpful if it manages to extract a sufficient subset of premises from a large model which is easier to reason with than the whole dataset. (For instance, if concepts and terms are merged in one dataset, a Stacked prequery might be staged to pick out just the terminology.)

Query bridging

Say we want to query a dataset in a third party vocabulary using TermFactory vocabulary. If the dataset is not too big, it can be loaded wholesale into a TF reasoner knowledge base, where it can be queried in TF schema vocabulary using the TF reasoner and a bridge schema that relates the repository's vocabulary to TFS. For a large repository or a third party sparql endpoint service, the download approach may not be feasible or even allowed. Nor is it usually possible to serve the bridge ontology to the remote endpoint and request it to do the reasoning for us.

In this case, one solution is to translate the native query at the TF client end into a query that is understood at the service endpoint. Such translation may not be fully faithful, but better than nothing. There is no general automatic solution (as yet) to this translation task. For the common special case where the mapping between the two vocabularies holds between individual named properties or classes, TF provides a query bridging facility. A bridge ontology in this case consists of rdfs:subPropertyOf axioms that subsume the service vocabulary's property names to the property names usesd in the client vocabulary. When such a bridge ontology is provided to the query engine using option TF_BRIDGE or commandline/query parameter bridge (-B) , TF rewrites the client query into disjunctive query in the service endpoint vocabulary.

The following sample query uses a bridge ontology to query the Wikipedia categories of the DBPedia title Tiger from dbpedia live sparql endpoint.

pellet4tf query -F -q home:/etc/scripts/select-tf-domains-by-subject.sparql -i dbp:Tiger -B home:/owl/bridge/dbpbridge.ttl -s dbplive

The native query select-tf-domains-by-subject.sparql refers to the TF property meta:hasSubjectField :

# select-tf-domains-by-subject.sparql #en list tf domains (subject fields) of resources (iri) #fi luettelo varantojen aihealueista (iri) PREFIX meta: <http://tfs.cc/meta/> SELECT * WHERE { $INPUT1 meta:hasSubjectField ?domain }

This query finds nothing in dbpedia, not surprisingly, as dbpedia knows nothing about the TermFactory schema (as yet :). But we can attach this bridge ontology to the query with the --bridge (-B) option:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix meta: <http://tfs.cc/meta/> . dcterms:subject rdfs:subPropertyOf meta:hasSubjectField .

The bridged query sent to the endpoint disjoins the dbpedia subproperty dcterms:subject to the query pattern using SPARQL 1.1. property path patterns :

SELECT * WHERE { dbp:Tiger dcterms:subject|meta:hasSubjectField ?domain } LIMIT 1000

which prompts dbpedia to answer with

2013-05-04 22:10:51,157 [main] INFO IO com.grapson.tf.que.pellet.TFQuery - send to sparql endpoint: http://dbpedia-live.openlinksw.com/sparql 2013-05-04 22:10:51,405 [main] INFO IO com.grapson.tf.que.pellet.TFQuery - Query_Results ( 13 answer/s limit 1000 ) Input dbp:Tiger Query_Results ( 13 answer/s limit 1000 ) ------------------------------------------------------------------------ | domain | ======================================================================== | <http://dbpedia.org/resource/Category:Animals_described_in_1758> | | <http://dbpedia.org/resource/Category:Mammals_of_Indonesia> | | <http://dbpedia.org/resource/Category:National_symbols_of_Singapore> | | <http://dbpedia.org/resource/Category:EDGE_species> | | <http://dbpedia.org/resource/Category:Megafauna_of_Eurasia> | | <http://dbpedia.org/resource/Category:Carnivora_of_Malaysia> | | <http://dbpedia.org/resource/Category:National_symbols_of_Malaysia> | | <http://dbpedia.org/resource/Category:Conservation_reliant_species> | | <http://dbpedia.org/resource/Category:Mammals_of_Asia> | | <http://dbpedia.org/resource/Category:Mammals_of_Russia> | | <http://dbpedia.org/resource/Category:Tigers> | | <http://dbpedia.org/resource/Category:Big_cats_of_India> | | <http://dbpedia.org/resource/Category:National_symbols_of_India> | ------------------------------------------------------------------------ http://localhost/TermFactory/query?q=file%3Aio%2Fsparql%2Fselect-domains-by-subject-iri.sparql&i=dbp%3ATiger&s=dbplive&B=file%3Aowl%2Fbridge%2Fdbpbridge.ttl&D=2013-05-04T19:10:50.551Z TFQuery ok http://localhost/TermFactory/query?q=file%3Aio%2Fsparql%2Fselect-domains-by-subject-iri.sparql&i=dbp%3ATiger&s=dbplive&B=file%3Aowl%2Fbridge%2Fdbpbridge.ttl&D=2013-05-04T19:10:50.551Z readAll_false Query_Results ( 13 answer/s limit 1000 ) imports false limit 1000

In the next example, the TF concept meta:SubjectField is related to the DBPedia property dcterms:subject with a range statement.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix meta: <http://tfs.cc/meta/> . @prefix dbps: <http://dbpedia.org/ontology/> . @prefix dbp: <http://dbpedia.org/resource/> . dcterms:subject rdfs:range meta:SubjectField .

When instances of class meta:SubjectField is queried from DBPedia against this bridge using query pattern ?s a meta:SubjectField , the query is expanded to a union (disjunction):

SELECT * WHERE { { { _:b0 dcterms:subject ?s} UNION { ?s rdf:type meta:SubjectField} } }

The range restriction is included in the query disjoined to the original query pattern. Mappings of forms

dbps:Mammal rdfs:subClassOf ont:Mammal rdfs:label rdfs:domain exp:Designation

work too and are bridged into unions in the same way. The first bridge will include DBPedia mammals to a search for TF mammals, and the second bridge would include all DBPedia entities with rdf labels into TF designations (rather too much of a muchness most likely).

The following example uses an OWL role as a bridge. This bridge says that the class ont:Animal should include items related to dbp:category:Tigers by property dcterms:subject.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix meta: <http://tfs.cc/meta/> . @prefix dbps: <http://dbpedia.org/ontology/> . @prefix dbp: <http://dbpedia.org/resource/> . @prefix ont: <http://tfs.cc/ont/> . dcterms:subject rdf:type owl:ObjectProperty . dbp:Tigers owl:equivalentClass [ a owl:Restriction ; owl:onProperty dcterms:subject ; owl:hasValue <http://dbpedia.org/category:Tigers> ] ; rdfs:subClassOf ont:Animal .

When the client query pattern is ?s a ont:Animal the bridged query becomes

SELECT * WHERE { { { ?s dcterms:subject <http://dbpedia.org/category:Tigers>} UNION { ?s rdf:type dbp:Tigers} UNION { ?s rdf:type ont:Animal} } }
Iterated queries

Iterated queries help manage distributed term collections factored into many smaller pieces, by automating workflows involving many inputs and outputs.

pellet4tf can iterate different types of query on an array of inputs. An iterated query of some type runs the query repeatedly over a list of inputs of the base type. Iterated queries are marked by an --input (-i) parameter.

The input to the iterated query is given as value of parameter --input (-i) as a list of inputs. The input is first split by a regular expression given as --separator (-J) , or taken from TF_SEPARATOR . The default is newline. If there is only one group of inputs, the iteration is over the items on the group. If there are more, the iteration is over groups. Groups are split by the regular expression given as --joiner (-j) , or taken from TF_JOINER . The default is inline whitespace. Each group is fed as input to the URL or URI query in turn.

If option --merge (-m) is given, the results from different groups are merged, else the groups are processed If a download URL is given with option download-url (-w) , the results are written to that URL. If the results were merged into one, the merge is written. If not, the results from the different groups are written in separate files, named by concatenating the output filename and the input group, percent encoding the input group if appropriate. In this case, the download URL must be a writable database or directory.

The command below describes items in dom.tsv using the --describe (-U) option.

pellet4tf query -F -i dom.tsv -U INPUT ../owl/sfield/TFSField.owl

The command below fetches or creates descriptions for a table of items in items.tsv using the --uri (-U) option.

pellet4tf query -F -i items.tsv -U INPUT

Each group can consist of just one item. Compare the two queries

pellet4tf query -u "foo bar" pellet4tf query -i "foo bar" -u INPUT -J " "

The first query merges the models foo and bar into one model and returns the merger. The second query fetches foo and bar separately and concatenates the results to the output.

Option --merge (-m) tells pellet4tf to join the results into one result set. The next two commands have the same results. (The first form makes one call to the underlying query engine, the second two.)

pellet4tf query -F -U "ont:SubjectField_10 ont:SubjectField_20" ../owl/sfield/TFSField.owl pellet4tf query -F -i "ont:SubjectField_10 ont:SubjectField_20" -J " " -U INPUT ../owl/sfield/TFSField.owl

Option --pack (P) does not run any query. Instead it packs the given query parameters into an equivalent query URL for the TermFactory webapp. The webapp site is taken from TF option TF_APP . Example:

pellet4tf query -P -F1 -q sparql/select-term-equivalents.sparql -f2 TSV -i term:en-number-N_-_exp-number -F2 ../owl/tf-TFS.owl http://localhost/TermFactory/query?q=file%3Aio%2Fsparql%2Fselect-term-equivalents.sparql&i=term%3Aen-number-N_-_exp-number&r=file%3Aowl%2Ftf-TFS.owl&f2=TSV&a=1
Boilerplate queries

A TF boilerplate query is an iterated SPARQL query with substitution parameters. A boilerplate query is run repeatedly with substitution parameters in the query replaced everywhere in the query text with inputs in a table of inputs in turn. The substitution is plain string replacement in the SPARQL query text, so care must be taken to make sure that only the relevant bits in the query get substituted for.

Boilerplate queries are able to map SPARQL SELECT query results (TSV tables of variable bindings) back to RDF graphs. This makes the category diagram of different SPARQL query result types commute, leaving no blind alleys or dead ends.

Boilerplating is governed by the options --input (-i), --interpretation (-I), --joiner (-j), and --separator (-J) as follows. The input option is a string that is split with the joiner regexp into an array of input items. If an input item starts or ends with the value of option TF_LIST, it is taken to be a filename to load further inputs from. Example: -i "foo bar baz.tsv" splits into two input items and one file of further inputs. The joiner connects items on input rows, and the splitter separates rows.

The interpretation switch -I specifies an assignment of input values to substitution variables in the boilerplate script. The interpretation string is split into an array by splitter regex and the substituends in the array are matched positionally with columns (pointwise query) or rows (crosswise query) of input. If there are more substituends than parameters, or more parameters than substituends, the unmatched part is just skipped.

A default interpretation for a boilerplate script can be supplied in the query header (see section on query headers). An example of query header line is # ?I=".?" (url encoded as I=%22%2e%3f%22). The substituend ".?" is a vacuous regular expression that matches any string. Given this header, An input parameter like i="a","i" substitutes for it a regex string that tells the query engine to filter for strings containing 'a' or 'A' (the "i" modifier specifies case insensitive search).

tfquery -i='"a","i"' -q=home:/io/scripts/list-resources.sparql tfs

An interpretation can also be supplied as a line of input data that starts with a question mark. A tsv file produced from a SELECT query starts with a line of bindings as shown in the example below. This line will be used as the default array of substituends in the boilerplate script. An interpretation from input data overrides an interpretation supplied in the query header. An explicit --interpretation (-I) parameter overrides both default interpretations.

Here is what boilerplating should do when there are multiple parameters with multiple values. One case is where the values form a table, each row aligned in a fixed number of corresponding columns, say:

?iri ?lang ?base ?cat dbp:Fish fi kala N dbp:Bird fi lintu N

Here, we want a boilerplate query to go by rows and do two queries, one for each row, each time doing four substitutions, one for each column, so as to produce two query results. In matrix terms, a this is a pointwise product.

TF pointwise product

A boilerplate query can be used among other things to map the result table of a SELECT query back to graph format. The above table of term key fields is mapped to the graph

@prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . @prefix dbp: <http://dbpedia/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . [] a term:Term ; term:hasDesignation [ a exp:Designation ; exp:baseForm "kala" ; exp:catCode "N" ; exp:langCode "fi" ] ; term:hasReferent dbp:Fish . [] a term:Term ; term:hasDesignation [ a exp:Designation ; exp:baseForm "lintu" ; exp:catCode "N" ; exp:langCode "fi" ] ; term:hasReferent dbp:Bird .

with command line pellet4tf query -F -d -i test.tsv -q tsv2tf.sparql empty.ttl where the table comes from test.tsv and the boilerplate is

# tsv2tf.sparql #en construct terms from a table of key fields #fi konstruoi termit taulukosta avainkenttiä PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX term: <http://tfs.cc/term/> PREFIX exp: <http://tfs.cc/exp/> PREFIX dbp: <http://dbpedia/resource/> CONSTRUCT { [ rdf:type term:Term ; term:hasReferent $INPUT1 ; term:hasDesignation [ rdf:type exp:Designation ; exp:langCode "(INPUT2)?" ; exp:baseForm "(INPUT3)?" ; exp:catCode "(INPUT4)?" ] ] } WHERE {}

The repository empty.ttl is an empty file. (All the data comes from the input and the query in this case, so the empty repository file is only a formality.)

Now consider another case, where the input is a collection of lines of varying length, the number of lines corresponding to the number of slots in the boilerplate.

dbp:Bird dbp:Fish ... en fi sv ...

This arrangement takes all combinations of a concept on the first row and a language on the second row, producing 2x3 = 6 results, doing two substitutions per query. The parameters of the query are aligned with rows, and the values are multiplied out. The rows can be of varying length. In matrix terms, this is a cross product (outer product).

TF cross product

The TermFactory query engine uses flag cross (-X) to decide which way to carry out the multiplication. The default is to align parameters by column and do a pointwise product. When the flag is checked, script parameters correspond to rows and their values are multiplied out. Input for the first sort of query can be produced with a tabular SELECT query listing in TSV format. Input for the second type can be produced as a column of SELECT query listings. The row joiner (default: inline whitespace) and column separator (default: newline) are user settable.

The following command line exemplifies the use of the input-file and output-file switches to process a file of inputs looping the same boilerplate query and dataset. The input file switch (-i) takes as argument a file of inputs separated by whitespace (say newline).

pellet4tf query -F -i dom.tsv -o dom/ -f TURTLE -q sparql/entry.sparql ../owl/sfield/TFSField.owl

The above command produces a terminology entry for each subject field in the TermFactory subject field classification in a local directory dom/.

Option --keep (-k) tells pellet4tf to enter a query loop keeping the same query engine and loaded repositories. New queries can be made by giving new sets of parameters on a command line. The loop ends with an end of file character:

pellet4tf query -k [other arguments ...]
pellet4tf query -V OFF -F -i "^a ^b" -q etc/scripts/select-term-keys-by-base-i.sparql owl/tf-TFS.owl ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | base | exp | term | concept | ============================================================================================================================================================================== | "A" | http://tfs.cc/exp/tfs-A-N | http://tfs.cc/term/tfs-A-N_-_exp-Adjective | http://tfs.cc/exp/Adjective | | "American English" | http://tfs.cc/exp/en-American_English-AN | http://tfs.cc/term/en-American_English-AN_-_exp-American_English | http://tfs.cc/exp/American_English | | "adjective" | http://tfs.cc/exp/en-adjective-N | http://tfs.cc/term/en-adjective-N_-_exp-Adjective | http://tfs.cc/exp/Adjective | | "adposition" | http://tfs.cc/exp/en-adposition-N | http://tfs.cc/term/en-adposition-N_-_exp-Adposition | http://tfs.cc/exp/Adposition | | "affective value" | http://tfs.cc/exp/en-affective_value-N | http://tfs.cc/term/en-affective_value-N_-_term-affect | http://tfs.cc/term/affect | | "aihealue" | http://tfs.cc/exp/fi-aihealue-N | | | | "aihealueet" | http://tfs.cc/exp/fi-aihealueet-N | http://tfs.cc/term/fi-aihealueet-N_-_meta-hasSubjectField | http://tfs.cc/meta/hasSubjectField | | "aika" | http://tfs.cc/exp/fi-aika-N | http://tfs.cc/term/fi-aika-N_-_sem-hasTime | http://tfs.cc/sem/hasTime | | "alias" | http://tfs.cc/exp/fi-alias-N | http://tfs.cc/term/fi-alias-N_-_owl-sameAs | http://www.w3.org/2002/07/owl0#sameAs | | "any" | http://tfs.cc/exp/en-any-D | http://tfs.cc/term/en-any-D_-_meta-Value | http://tfs.cc/meta/Value | | "arvo" | http://tfs.cc/exp/fi-arvo-N | http://tfs.cc/term/fi-arvo-N_-_meta-value | http://tfs.cc/meta/value | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ---------------------------------------------------------------------------------------------------------------------------------------- | base | exp | term | concept | ======================================================================================================================================== | "base form" | http://tfs.cc/exp/en-base_form-N | | | | "basic form" | http://tfs.cc/exp/en-basic_form-N | http://tfs.cc/term/en-basic_form-N_-_exp-baseForm | http://tfs.cc/exp/baseForm | ----------------------------------------------------------------------------------------------------------------------------------------

Edit utility

The TF editing back-end is implemented with class grapson.tf.rev.jena.Edit. It can be run from command line using the script tfedit in io/script .

tfedit is a commandline utility for carrying out editing operations. The unit of an edit operation by tfedit is a RDF graph. tfedit knows how to add, subtract, and take meets and differences of RDF triple sets. The typical scenario is where a larger ontology (called the active) is revised by removing an original subset and adding an edited version of the subset. In order to minimise transactions with the active ontology, the differences between the original and the edits are first determined and it is the differences that are removed from and added to the active ontology, respectively. tfedit works as a postfix calculator. Operands are graphs that get pushed on the stack on read. Each postfix operator pops one to three operands and replaces them with the result of the operation on top of the stack. Common options can be freely interspersed.

tfedit Usage: --help(=options|level|format|style) (-h) ( ADDRESS --check | --edits | --orig | --pop | --factor=skolem|blank|names|keys|format ADDRESS ADDRESS --add | --del | --diff | --swap | ADDRESS ADDRESS ADDRESS --edit ) ... | queryString

The parameters and options are detailed below.

Specific options:

option short form description values
--add resolve (TF location map) address/es binary postfix operator
--check extract original and edits from top of stack for editing. skolemises blanks by default. unary postfix operator
--del resolve (TF location map) address/es unary postfix operator
--diff list imports of ontologies at given address binary postfix operator
--edit Edit first arg (active) with original (next) and edits (top of stack): delete original minus edits and add edits minus original ternary postfix operator
--edits extract edits from top of stack unary postfix operator
--factor apply a resource renaming operation to top of stack unary postfix operator
--orig -i extract original from top of stack. unary postfix operator
--pop -i removes operand from top of stack unary postfix operator
--swap swap the top two elements of the stack binary postfix operator

Each postfix operator pops as many operands froms stack as it needs (An operand is an address or text contents from standard input for - as a placeholder empty argument. and pushes the result on stack. By default, blanks are converted to as skolem constants. Real blanks are shown in red in the default HTML skin, Skolem constants are black. Check and diff are inverses. The queryString argument is an edit servlet querystring. This allows testing servlet operations offline.

Usage examples follow.

tfedit -F blank.ttl skolemdiff.ttl --check --edit

Here is an example of command line editing. entity.ttl is a bilingual WordNet entry and entity-fi.ttl is its editable Finnish language content (the active model). entity-edits.ttl is the result of changing occurrences of Finnish word kokonaisuus 'whole' in entity.ttl to word olio 'being, entity'.

tfedit entity-fi.ttl entity-original.ttl entity-edits.ttl --del --del entity-edits.ttl entity-original.ttl --del --add > entity-fi-edited.ttl

This deletes from entity.fi.ttl what entity-edits.ttl deleted from entity-original.ttl and then adds what it added.

tfedit entity.ttl entity-original.ttl entity-edits.ttl --del --del entity-edits.ttl entity-original.ttl --del --add > entity-edited.ttl

This does the same to entity.ttl

factor --format=HTML entity-edited.ttl --active=entity-fi-edited.ttl --schema=../owl/wn/TFwn.owl > entity-edited.html

This writes out entity-edited.ttl in xhtml for further editing using the updated Finnish file entity-fi.edited as active model.

Operation --edit allows telescoping the three steps above into one commandline:

tfedit --schema=TFWn.owl --format=HTML --active=entity-fi.ttl entity.ttl entity-original.ttl entity-edits.ttl --edit> entity-edited.html

The operands to tfedit can also be given from standard input. A bare filetype like .ttl is read from standard input in the format indicated by the filetype. An operand of that starts with the string ?p= is interpreted as an EditForm edit URL . Empty operands can be indicated by supplying names of nonexistent files:

tfedit - - .ttl --edit < olio.ttl

The dash placeholders for missing active model and original on the line are not reserved, it is assumed there just happens to be no file named dash.

Warning: Some ontology repositories may merge a model with its imports. When fetching the active model, one should take care lest the model includes the imports (unless that is what one wants).

Another warning: editing a redundant ontology that mixes axioms and theorems may not produce the intended result, for unless axioms are changed, theorems will reappear when reasoner is applied.

With the --keep=true (-k) option, tfedit loops executing new commandline arguments using the same engine until end of file (^D on Linux).

The tfedit utility uses SPARQL UPDATE queries which allow stating update operations as queries to a RDF database. tfedit also works on read-only ontologies and returns an updated version of the ontology. For an example, assume we want to edit the term for "Termtehdas", changing the classification of the designation from exp:Designation to exp:Appellation. The active (blank.ttl):

@prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix meta: <http://tfs.cc/meta/> . [] a owl:Thing , term:Term ; term:hasDesignation [ a owl:Thing , exp:Designation , exp:Noun ; exp:baseForm "Termitehdas" ; exp:catCode "N" ; exp:langCode "fi" ] ; term:hasReferent meta:TermFactory .

It would not do to rewrite just any (or every) blank triple [ a exp:Designation ] to [ a exp:Appellation ] in the active ontology, for this would affect a random (or every) anonymous designation, while we only want to edit this occurrence. But there is no way to identify individual blank nodes across ontologies. A blank node marks existential quantification whose scope includes all the triples in which that blank occurs, plus recursively, the scopes of any other blanks in the scope of the first one. The scope is known as blank node closure. It is this closure that can be edited as a unit. To edit a blank in an active ontology, one must include its closure in the original. A TF DESCRIBE query does that.

To edit an original containing blanks, we first replace blanks with temporary (skolem) constants, edit the skolemised term, and then let tfedit calculate the update query from the differences between the skolemised original and edits. In our example: original (skolem.ttl):

@prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix meta: <http://tfs.cc/meta/> . <urn:blank:-2e9cda96:1449b97d6ed:-7f18> a owl:Thing , term:Term ; term:hasDesignation <urn:blank:-2e9cda96:1449b97d6ed:-7f19> ; term:hasReferent meta:TermFactory . <urn:blank:-2e9cda96:1449b97d6ed:-7f19> a owl:Thing , exp:Designation , exp:Noun ; exp:baseForm "Termitehdas" ; exp:catCode "N" ; exp:langCode "fi" .

edits (skolem2.ttl):

@prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix meta: <http://tfs.cc/meta/> . <urn:blank:-2e9cda96:1449b97d6ed:-7f18> a owl:Thing , term:Term ; term:hasDesignation <urn:blank:-2e9cda96:1449b97d6ed:-7f19> ; term:hasReferent meta:TermFactory . <urn:blank:-2e9cda96:1449b97d6ed:-7f19> a owl:Thing , exp:Appellation , exp:Noun ; exp:baseForm "Termitehdas" ; exp:catCode "N" ; exp:langCode "fi" .

The edit command tfedit -F blank.ttl skolem.ttl skolem2.ttl --edit forms the update request

tf-exia com.hp.hpl.jena.sparql.modify.request.Update4TF: edit request PREFIX exp: <http://tfs.cc/exp/> PREFIX term: <http://tfs.cc/term/> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX meta: <http://tfs.cc/meta/> DELETE { ?b_7f19 rdf:type exp:Designation . } INSERT { ?b_7f19 rdf:type exp:Appellation . } WHERE { ?b_7f18 term:hasReferent meta:TermFactory . ?b_7f18 rdf:type term:Term . ?b_7f19 exp:baseForm "Termitehdas" . ?b_7f19 rdf:type owl:Thing . ?b_7f19 rdf:type exp:Noun . ?b_7f18 term:hasDesignation ?b_7f19 . ?b_7f18 rdf:type owl:Thing . ?b_7f19 exp:catCode "N" . ?b_7f19 rdf:type exp:Designation . ?b_7f19 exp:langCode "fi" . FILTER ( isBlank(?b_7f18) && isBlank(?b_7f19) ) . }

The update query makes sure only the designation of the term "Termitehdas" is affected, although the term and its designation are anonymous. The result of the edit is this:

@prefix exp: <http://tfs.cc/exp/> . @prefix term: <http://tfs.cc/term/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix meta: <http://tfs.cc/meta/> . [] a owl:Thing , term:Term ; term:hasDesignation [ a exp:Appellation , owl:Thing , exp:Noun ; exp:baseForm "Termitehdas" ; exp:catCode "N" ; exp:langCode "fi" ] ; term:hasReferent meta:TermFactory .

With the --quads (-4) flag set, a TF datastore can be edited with contexted models (quad files). The active must be a writable datastore. The edit operation is factored so that each context named in the arguments of the operation is edited separately in the given store. For example, in

tfedit -F -4 dav:/sample/ bob.ttl carol.ttl --edit

dav:/sample is a dav directory store containing named files, and the quad files (contexted models) bob.ttl and carol.ttl may refactor it by moving statements from one file to another.

Copy utility

tfcopy is a facility for storing TF documents into specific locations in the TermFactory repository system. In particular, it provides for copying entries or ontologies into WebDAV web directories, indexed by the URI path.

tfcopy is a wrapper round the java class com.grapson.tf.rev.jena.Copy.java .

tfcopy Usage: --help(=options|level|format|style) (-h) --writable ADDRESS (ADDRESS)

Specific options:

option short form description values
--source -s download address or contents to copy flag
--out -o upload address to copy to flag

For instance, command line

tfcopy -F home:/entity.html "--asm=dav --name=http://purl.org/vocabularies/princeton/wn30/synset-entity-noun-1"

copies Wordnet entry entity.html from current directory to webdav subdirectory purl.org/vocabularies/princeton/wn30/ by name synset-entity-noun-1. The entry is then downloadable from address http://localhost/dav/purl.org/vocabularies/princeton/wn30/synset-entity-noun-1 . Here is another matching pair of command lines.

tfcopy -F bla.html --asm=dav --user=name --pass=word tfget --user=name --pass=word --asm=dav --name=home:/io/bla.html

With the --keep=true (-k) option, tfcopy loops executing new commandline arguments using the same engine until end of file (^D on Linux).

tfcopy will not overwrite an existing document unless overwrite flag is set.

tfcopy http://localhost/active.ttl copy ok http://localhost/dav/localhost/active.ttl length 1024 tfcopy http://localhost/active.ttl copy at http://localhost/dav/localhost/active.ttl exists, not overwritten tfcopy http://localhost/active.ttl over=true copy ok http://localhost/dav/localhost/active.ttl length 211

pellet4tf utility

Pellet4TF is pellet reasoner adapted to TF as Pellet4TF.java. It can be run using command line script io/bin/pellet4tf .

pellet4tf classify

pellet4tf classify adds JSON and RDF/XML output formats for classifications, and the ability to query the class tree from a given seed up and down to a given depth.

PelletClassify for TF: Classify a TF ontology [around a seed class] and display the hierarchy Usage: <pellet> classify [options] <file URI>... Argument description: --help, -h Print this message --verbose, -v Print full stack trace for errors. --conf, -C (configuration file) Use the selected configuration file --persist, -p Enable persistence of classification results. The classifier will save its internal state in a file, and will reuse it the next time this ontology is loaded, therefore saving classification time. This option can only be used with OWLAPIv3 loader. --loader, -l (Jena | OWLAPI | OWLAPIv3 | KRSS) Use Jena, OWLAPI, OWLAPIv3 or KRSS to load the ontology (Default: OWLAPIv3) --ignore-imports Ignore imported ontologies --input-format (RDF/XML | Turtle | N-Triples) Format of the input file (valid only for the Jena loader). Default behaviour is to guess the input format based on the file extension. --file, -F resolve file urls to working directory --seed, -s (C) One class URI or local name to build a taxonomy for. Example: "Animal" --up, -u (I) Number of levels up from seed in classification tree. Default -1 (all) --down, -d (I) Number of levels down from seed in classification tree. Default -1 (all) --invert, -i Invert properties. Default false --output-format, -f (text | JSON | RDF/XML) Format of graph (Default: text)

Here is an example of using pellet4tf classify. Shown is the classification tree around the concept exp:Designation in TFS.owl up and downward to arbitrary depth (-1).

pellet4tf classify -d -2 -u -2 -s http://tfs.cc/exp/Designation ../owl/TFS.owl Wed Jan 09 00:51:48 EET 2013 INFO: taxonomy for http://tfs.cc/exp/Designation up -1 down -1 http://tfs.cc/exp/Designation http://tfs.cc/exp/Appellation http://tfs.cc/exp/Designation http://tfs.cc/exp/Expression http://tfs.cc/meta/Meta _TOP_ http://tfs.cc/meta/Object _TOP_

Prefixes are not supported, so that the seed must be a full resource URI. The classifier uses the original pellet file loader, not TF tfget.

pellet4tf query

pellet4tf is an alternative commandline facility for TF that directly extends original pellet. pellet4tf query retains a few options from original pellet not available in tfquery (verbose, config, bnode, display-query).

pellet4tf help query PelletQuery4TF: Query Engine for TermFactory Usage: <pellet> query [options] <file URI>... Argument description: --help, -h Print this message (Default: false) --verbose, -v Print full stack trace for errors. (Default: false) --conf, -C (configuration file) Use the selected configuration file --active, -A HTML active ontology --bnode Treat bnodes in the query as undistinguished variables. Undistinguished variables can match individuals whose existence is inferred by the reasoner, e.g. due to a someValuesFrom restriction. This option has no effect if ARQ engine is selected. (Default: false) --bridge, -B bridge schema --create, -c create on save (Default: false) --depth, -d DESCRIBE query depth (Default: -1) --describe, -D (<URI or TF alias>) URI/s to describe. --describe-query, -G (<URL or TF alias>) Read TF DESCRIBE graph pattern from the given file --display-query Display the input query (Default: false) --edits, -E HTML edited ontology --encoding Output encoding (UTF-8 or UTF-16). (Default: UTF-8) --engine, -e (Pellet | ARQ | Mixed | SPARQL | Stacked ) The query engine that will be used. Default behavior is to auto select the engine that can handle the given query with best performance. Pellet query engine is the typically fastest but cannot handle FILTER, OPTIONAL, UNION, DESCRIBE or named graphs. Mixed engine uses ARQ to handle SPARQL algebra and uses Pellet to answer Basic Graph Patterns (BGP) which can be expressed in SPARQL-DL. ARQ engine uses Pellet to answer single triple patterns and can handle queries that do not fit into SPARQL-DL. As a consequence SPARQL-DL extensions and complex class expressions encoded inside the SPARQL query are not supported. SPARQL is just plain Jena ARQ without pellet. Stacked engine first runs a DESCRIBE query on input and then runs the main query on the result set --file, -F resolve all relative file urls to cwd --file1, -F1 resolve relative file urls to cwd for query and input --file2, -F2 resolve relative file urls to cwd for options and repository --format, -f (GF | HTML | JSON | N3 | NT | Properties | JSONLD | JSONLD_FLAT | RDF/XML | RDF/XML-ABBREV | Tabular | TBX | TSV | TURTLE)) Format of result ( JSON, Tabular, TSV for bindings; HTML both; rest for graphs) --glob, -g print global names (iris) (Default: false) --ignore-imports Ignore imported ontologies (Default: true) --input, -i An array of inputs to substitute in query. Text or address --input-format (RDF/XML | Turtle | N-Triples) Format of the input file (valid only for the Jena loader). Default behaviour is to guess the input format based on the file extension. --joiner, -j Regex for splitting input columns --keep, -k loop keeping loaded repo (Default: false) --lang, -l HTML localization language --lex GF lexicon name --level, -v logging level (OFF|ERROR|WARN|INFO|DEBUG|TRACE|ALL) (Default: INFO) --limit, -N result set size limit (-1 means no limit) --lion, -L HTML localization ontology --lynx, -H HTML hyperlinks file --merge, -m merge boilerplate query results together --notry, -n no location mapping (Default: false) --nowait, -w background query (no waiting) (Default: false) --offset, -M result set offset --original, -O HTML original --out, -o place to store the query results. Examples: <file URL>, <WebDAV URL>,--asm=<assembly URL> --name=<graph IRI>,--service=<SPARQL service URL --name=<graph IRI> --cross, -X cross multiply parameter values --over, -x overwrite on save (Default: false) --pack, -P Construct query string (Default: false) --interpretation, -I Query substitution assignment --pass password (Default: pass) --prefixes, -p model to copy prefixes from (RDF file) --query, -q (<URL or TF alias>) Read the SPARQL (or RDQL) query from the given file --query-text (<query text>) Read the SPARQL (or RDQL) query from command line --query-format (SPARQL | ARQ | RDQL) The query format (Default: SPARQL) --readAll, -a Load imported ontologies (Default: false) --factor, -W factor task (deblank|reblank|relabel|reformat...) --root, -R HTML entry root --schema, -S HTML layout bridge ontology --skin, -K HTML skin --splitter, -J Regex for splitting input rows --style, -Y HTML style (properties file) --template, -T HTML entry template --time, -Z Print detailed timing information (Default: false) --tree exclude cycles from HTML layout (Default: false) --try, -t (<URL or TF alias>) URL to resolve using retry. --untry, -y (<URL or TF alias>) URL to unresolve using retry. --uri, -U (<URI or TF alias>) root URI/s to describe (same as --describe X --root X). --url, -u (<URL or TF alias>) URL to get using retry. --user username (Default: guest) --writeAll, -b Write imports and inferences (Default: false)
TF offline workflow

TF offline workflow

The TF utilities play together to contribute to the workflow. Get locates entries and documents in the TF cloud. Factor converts between formats and does factoring of names. Pellet4TF is the query engine and reasoner. Edit supports ontology revisions. Copy handles distributing entries and documents across TermFactory repositories. A counterclockwise roundtrip on the perimeter of the figure traces one possible revisioning cycle of the TF country code ontology using TF commandline tools.

Show/hide TF offline workflow

TF Tools

Ontology work

This section documents one aspect of ontology based professional terminology work. viz. mixing and matching of term collections using OWL editors, reasoners, and converters. Using ontology tools can save manual labor in matching and merging data from differerent sources. Existing owl reasoners and query engines are generic tools whose application to terminological ontologies requires a high level of ontology expertise and remains semiautomatic at best. The aim of the TF specific ontology tools is to telescope the workflow so as to minimise expert intervention.

Extracting subsets

One of the basic tasks in using TF is extracting some desired subset of data from one or more repositories as dataset. This section surveys different ways of going about this task.

Extracting subsets using Jena and Pellet

To cut out a desired subset of an ontology, one can use the Jena RDF query engine ARQ and the Clark&Parsia SPARQL-DL query engine Pellet . The ARQ rdf query engine implements all of RDF SPARQL, including the SPARQL language DESCRIBE query and the FILTER function REGEX. On the other hand, since it is a RDF tool, it does not do OWL reasoning or understand OWL imports. Pellet ARQ engine queries individual triples with Pellet and does the rest with ARQ. Pellet Mixed engine uses Pellet for basic graph pattern queries but uses ARQ to apply sparql UNION or FILTER constructs on the results. Running Jena ARQ command line tool sparql with command line

sparql --query=tfquery2d.sparql --data=../../owl/TFS.owl

where tfquery2d.sparql is following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX term: <http://tfs.cc/term/> DESCRIBE ?inst WHERE { ?inst rdf:type term:Term . }

prints out an ontology describing instances of class term:Term in terms of the statements asserted about them in TFS.owl (in Turtle rdf format).

Show/hide query results
@prefix : <http://tfs.cc/owl/TFS.owl#> . @prefix exp: <http://tfs.cc/exp/> . @prefix tfs: <http://tfs.cc/> . @prefix meta: <http://tfs.cc/meta/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix exp: <http://tfs.cc/exp/> . @prefix owl2xml: <http://www.w3.org/2006/12/owl2-xml#> . @prefix term: <http://tfs.cc/term/> . @prefix ont: <http://tfs.cc/ont/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix isocat: <http://isocat.org#> . term:en-English-N_-_ont-English rdf:type owl:Thing ; rdf:type term:Term ; term:designation expression1:en-English-N ; term:hasReferent concept0:English . term:zh-程序-N_-_ont-Parser rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-程序-N ; term:hasReferent concept0:Parser . term:zh-芬兰语-N_-_ont-Finnish rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-芬兰语-N ; term:hasReferent concept0:Finnish . term:fi-jäsennin-N_-_ont-Parser rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:fi-jäsennin-N ; term:hasReferent concept0:Parser . term:en-for-P_-_ont-Goal rdf:type owl:Thing ; rdf:type term:Term ; term:hasDesignation expression1:en-for-P ; term:hasReferent concept0:Goal . term:zh-剖析-V_-_ont-Parse rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-剖析-V ; term:hasReferent concept0:Parse . term:en-Chinese-A_-_ont-Chinese rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; meta:hasSource meta1:Wang ; term:hasDesignation expression1:en-Chinese-N ; term:hasReferent concept0:Chinese . term:zh-为了-P_-_ont-Goal rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-为了-P ; term:hasReferent concept0:Goal . term:zh-在___里-P_-_ont-Place rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-在___里-P ; term:hasReferent concept0:Place . term:zh-是-V_-_ont-Role rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-是-V ; term:hasReferent concept0:Role . term:en-in-P_-_ont-Place rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-in-P ; term:hasReferent concept0:Location . term:en-be-V_-_ont-Role rdf:type owl:Thing ; rdf:type term:Term ; rdfs:comment "English verb for the subclass relationship"@en ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-be-V ; term:hasReferent concept0:Role . term:zh-部分-N_-_exp-Preposition rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:zh-部分-N ; term:hasReferent expression0:Preposition . term:zh-英语-N_-_ont-English rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-英语-N ; term:hasReferent concept0:English . term:Term rdf:type owl:Thing ; rdf:type term:Term . term:fi-suomi-N_-_ont-Finnish rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:fi-suomi-N ; term:hasReferent concept0:Finnish . term:en-part-N_-_ont-Part rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-part-N ; term:hasReferent concept0:Part . term:zh-部分-N_-_ont-Part rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; term:hasDesignation expression1:zh-部分-N ; term:hasReferent concept0:Part . term:en-parse-V_-_ont-Parse rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-parse-V ; term:hasReferent concept0:Parse . term:en-program-N_-_ont-Program rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:TFS ; term:hasDesignation expression1:en-program-N ; term:hasReferent concept0:Software . term:zh-中文-N_-_ont-Chinese rdf:type owl:Thing ; rdf:type term:Term ; meta:hasSource meta1:Ji ; meta:hasSource meta1:Wang ; term:hasDesignation expression1:zh-中文-N ; term:hasReferent concept0:Chinese .

Because ARQ as an RDF tool does not apply an OWL reasoner, one cannot use it to capture statements which are only implied by an ontology, such as transitive closures. To run a query over, say, all the subclasses or members of a class, we can use the Pellet SPARQL-DL query engine.

Running the Pellet command line tool pellet.sh with command line

pellet query -q tfquery0.sparql $TF_HOME/owl/TFS.owl

where tfquery0.sparql is following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX ont: <http://tfs.cc/ont/> PREFIX term: <http://tfs.cc/term/> PREFIX exp: <http://tfs.cc/exp/> PREFIX meta: <http://tfs.cc/meta/> CONSTRUCT { ?conc rdfs:subClassOf ont:Concept . } WHERE { ?conc rdfs:subClassOf ont:Concept . }

prints out an ontology that asserts all subClassOf statements under ont:Concept asserted or entailed by TFS.owl (in RDF/XML format):

Show/hide pellet log
<rdf:RDF xmlns:tfs="http://tfs.cc/" xmlns:isocat="http://isocat.org#" xmlns="http://tfs.cc/owl/TFS.owl#" xmlns:ont="http://tfs.cc/ont/" xmlns:term="http://tfs.cc/term/" xmlns:owl2xml="http://www.w3.org/2006/12/owl2-xml#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:meta="http://tfs.cc/meta/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:exp="http://tfs.cc/exp/" <rdf:Description rdf:about="http://tfs.cc/ont/Program"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Ends"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Linguistics"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Information"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Language_industry"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Means"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Finnish"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Information_and_communication_technology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Inanimate"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Instrument"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Geography"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Software"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Source"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/SubjectField"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Input"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Chemistry"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Data"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Place"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Patient"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Construction_industry"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Animate"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://www.w3.org/2002/07/owl#Nothing"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Function"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Chinese"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Country"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Opinion"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Language_technology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Agent"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Human"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Terminology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Part"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Time"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/American_English"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Role"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Effect"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Goal"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Language"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Location"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Multilingual_language_technology"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Cause"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Parser"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/ont/Concept"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> <rdf:Description rdf:about="http://tfs.cc/exp/English"> <rdfs:subClassOf rdf:resource="http://tfs.cc/ont/Concept"/> </rdf:Description> </rdf:RDF>

On the other hand again, the current version (Pellet 2.0.0.rc7) of the Pellet query engine does not do DESCRIBE queries. Also, the result set of a SPARQL-DL query contains both asserted and entailed statements indiscriminately.

Pellet is able to extract certain types of ontology subsets. pellet extract extract all statements of a given OWL type from an ontology. The allowed types are:

DefaultStatements, AllClass, AllIndividual, AllProperty, ClassAssertion, DataPropertyAssertion, DifferentIndividuals, DirectClassAssertion, DirectSubClassOf, DirectSubPropertyOf, DisjointClasses, DisjointProperties, EquivalentClasses, EquivalentProperties, InverseProperties, ObjectPropertyAssertion, PropertyAssertion, SameIndividual, SubClassOf, SubPropertyOf]. Example: "DirectSubClassOf DirectSubPropertyOf" (Default: DefaultStatements)

The (closed world, nonmonotone) query predicates directRDFType, directSubClassOf, and directSubPropertyOf are useful for extracting terminological concept systems out of an ontology.

pellet modularity extracts "safe" modules (sub-ontologies) around given target entities. A safe module extracts enough statements around the target concepts that the inferences from the target concepts in the extract are the same as in the original ontology, i.e. any statement added to the extract will cause a conflict in the extract if it would contradict the original ontology. Unfortunately, safe modules tend to be too large or too hard to find for practical purposes, if the ontology is well connected.

Different tools can be chained. Say we want to extract from TFS.owl all and only the statements asserted in it about classes that are (asserted or entailed ) subclasses of ont:Concept . Here is one way. First select the relevant classes from TFS.owl using the Pellet engine. Temporarily add the inferred subclass statements to TFS.owl (as assertions, say using some editor). Then run the ARQ engine to describe all the (now) asserted subclasses of ont:Concept in the extended model. The result is the desired sub-ontology. Though doable, this involves many steps of tedious work. The TF tools try to reduce such intermediate steps.

Extracting subsets from YSO using the Sesame RDF library

In an early stage of the TF project, the FinnOnto YSO thesaurus ontology turned out too big to handle with Protege or Jena tools in the space available on then current desktop computers. The more robust Sesame 2 RDF repository library and query language seRQL was used to extract manageable sized coherent subsets from YSO around concepts matching a given pattern.

The extraction tool consisted of the following pieces:

SuperGraph.java
a java script built on the Sesame rdf library which reads the yso ontology, finds all concepts matching a given pattern, and extracts an upward closure of the matched concepts under the yso schema from yso. The superclasses of each concept are included in the closure, but narrower, related or associated concepts are not included recursively.
YSO_schema.owl
a manually extracted schema subset of YSO.owl
YSO_header.owl
an RDF header file to be included in the extract
YSO.owl
local copy of the YSO ontology
foaf.owl
local copy of the Friend of a Friend ontology referenced by YSO
skos.owl
local copy of the Simple Knowledge Organization Systems ontology referenced by YSO

The hits are marked up with string property hit:pattern which holds the search pattern. They can be retrieved from the extract using the Protege SWRL query tool with the query SELECT ?subj WHERE ?subj hit:pattern ?obj .

The SuperGraph script is called with

java SuperGraph YSO.owl YSO_schema.owl YSO_header.owl pattern

The pattern is given as a sesame query language seRQL expression where 'string' refers to the string value of the meta:label of a class, for example

"string LIKE \"*rakennus*\" AND NOT string like \"*kehon*\"".

*

Merging

One of the basic tasks in using TF is extracting some desired subset of data from one or more repositories as dataset. This section surveys different ways of going about this task.

One can also use tfedit to add and subtract ontologies.

Extracting, combining and matching terminological ontologies using generic and TF ontology tools has been studied in connection with merging the epidemic ontologies PULS and Biocaster. The epi related sparql scripts are in TF svn in directorycnv/script.

There are two main ways of merging ontologies in TF, by rewriting, or by importing and bridging. In rewriting, the contents of the donor ontology are reshaped into TF entities, removing the original structure. In importing/merging, a donor OWL ontology imports TFS and its concepts are related to TFS top ontology using bridge axioms. Rewriting avoids redundancy, saves space and creates better integration, but concurrent development of ontologies is difficult. Importing and bridging creates redundancy, bloat and clutter, but allows ontologies to keep their identity under the merger. The two methods can also apply in sequence, which may give something of both. Query imports offer a dynamic method of partial merging of ontologies.

The workflow in merging and matching the BioCaster and PULS epidemic-disease ontologies is described as follows. The tasks below have the form source -method-> target , where source and target are files and method is a shorthand for a command line to run or to manual steps (e.g. using an editor). This describes the workflow before implementing the TF query engine. In the following section, we show how the workflow gets simplified using the TF query engine.

To run one of the *.xsl scripts, use a xslt processsor (Xalan, Saxon, xsltproc, or other), for instance, to do the first task, run command java org.apache.xalan.xslt.Process -in BioCaster2008-04-20.owl -xsl bioc.xsl -out bioc.owl .

Ontology conversion

This section traces the steps taken to convert PULS and BioCaster ontologies to TF.

source/s method target task
BioCaster2008-04-20.owl bioc.xsl bioc.owl (converts the thesaurus, i.e. the diseases)
biot.xsl biot.owl (converts the terms)
TFS.owl

biot.owl

bioc.owl
write bridge biob.owl (imports and bridges biocaster)
source/s method target task
def-basic-concepts-disease.lisp expel puls-basic.ttl (converts the class schema using a lisp program)
def-all-disease-list.lisp puls.perl puls-all.ttl (converts the disease list using a perl script)
puls-basic.ttl

puls-all.ttl
write bridge puls-bridge.owl (imports and bridges puls)
source/s method target task
TFS.owl

bio-bridge.owl

puls-bridge.owl
write bridge epi.owl (imports and bridges puls and biocaster into one)
Ontology merging

RDF does not do imports. One can form a sparql dataset of the model and its imports. Another solution is to merge imports to one file.

source/s method target task
epi.owl Protege4 or factor epi2.owl (merge epi.owl to epi2.owl, see Note 2 below)

Matching

This section traces the steps taken to match the diseases in two ontologies. Since only disease name similarity was used in the matching, the diseases were first extracted from the leveled ontology using SPARQL and then compared.

source/s method target task
epi2.owl epi-extract epi-extract.owl (extracts the diseases and their English names)
epi-extract.owl epi-match epi-match.ttl (compares the PULS and BioCaster disease names, see Note 1 below)
Ontology extraction

This section traces a series of steps to extract a sub-ontology describing the diseases.

epi2.owl epi-instance epi-instance.ttl (use sparql to describe the matched instances)
epi2.owl epi-concept epi-concept.ttl (use sparql to describe the matched classes)
epi2.owl epi-term epi-term.owl (use pellet to extract the terms for the matches)
epi2.owl epi-class epi-class.owl (use pellet to extract the superclasses of the matches)
epi2.owl epi-tree epi-tree.ttl (use sparql to extract the superclass tree of the matches)
epi-instance.ttl

epi-concept.ttl

epi-tree.ttl

epi-term.owl
manual epi-merge.owl (merge the extracted parts)

Notes:

  1. The match condition in epi-match.sparql compares disease names case-insensitively using the following ad hoc similarity criterion: two names match if each matches the other as a regular expression. This is just a first stab that can be improved. Further match conditions that concern semantic properties beyond the mere label could be added.
  2. Exporting TF ontologies with Protege 4 fails when resource URIs contain non-ascii Unicode characters. Protege 4.0 generated unused entity declarations and xml namespace prefix declarations for URIs containing such characters. Protege 4.1.0 (build 209) breaks when exporting the leveled ontology. Ontology imports can also be merged by factor utility option --writeAll (-a).
  3. Protege 4 generates OWL 1.1 / 2.0 schema elements which Pellet version 2.0.0.rc7 does not support.

The sequence of ontology merging and extraction steps described above can be telescoped into one pellet4tf query.

Disease ontology match query

Disease ontology match result

Importing the alignment to epi.owl virtually merges the matched diseases so that an OWL reasoner or editor treats each pair as one and the same entity. Alternatively, a DESCRIBE query can be written to extract a small ontology containing just the matched diseases.

The alignment is expressed as a bridge ontology. The matched ontologies can remain as distinct logical and physical entities.

Using bridges or on-demand ontology conversion, the owners of the two ontologies can keep automatically updated about developments between their respective TF repositories.

Masters thesis on ontology matching in TF (Kun Ji).

Size and speed

The combined PULS/BioCaster epidemic ontology contains about 3000 classes and 20K instances, altogether about 300K triples. Their materialization (inference closure) contains about 1M triples.

The disease match query using reasoner from the merged PULS-BioCaster ontology took minutes of real time on usual hardware. For web use, offline processing of ontology queries and caching of results is the way to go.

Repositories can get big, but queries and entries need to remain manageable size ("small models" approach).

TF back end

A TermFactory website network consists of third party collaborative wiki/forum front end servers and TF back end servers. Each node can also operate standalone. Collaborative (forum/wiki) servers may pair up with bBack end servers, but that is not necessary. Collaboration platforms and TF services may or may not share infrastructure at some level; most likely, they are separate. The two subsystems can work independently.

The wiki/forum platforms combine formatted (form and editable graph) and free-form (text/read-only image) views to the same ontology entities. The formatted content is for human-machine communication, the text human to human. The forms are generated dynamically from ontology URLs. The back-end for the formatted views can be technically a TF repository, but not necessarily in the same db instance and most likely not the same model/graph as already established content.

With a web services based API, TF aims at a loose coupling between the TermFactory back end and alternative front-end designs and implementations. Front ends have short shelf times, it is risky to fix upon any particular platform. TF should provide a stable content design and workflow architecture, a generic back end implementation, and a flexible pluggable or mashup style front end concept plus a light demonstrator of the whole.

Starting from the back end, TF repositories appear to higher layers as an ontology stores and endpoints. Above that, the TF query service allows submitting queries to the repository network. The TF edit service allows saving and retrieving entries for editing from repositories. The TF copy service helps move data between stores.

On top, there is the TermFactory front end API which provides mashups and services to third party TF front ends such as query form, text area editor and commenting system. TF front ends can then be implemented either natively as collaborative platforms, included into existing ones as server side plugins, or merged on existing pages as client side mashups .

A similar approach was taken in the Finnish KITES Multilingual Workbench/Desktop project. MLWB tools are typically used through organization’s own productivity tools such as Microsoft Word, business portals, and content management systems, or through their own user interfaces.

Show/hide MLWB Architecture MLWB Architecture

Another proposal for an ontology registry is Oyster provided in the Neon Toolkit . It uses peer-to-peer networking to contact repositories.

TF started out with "big web services", including the XML based SOAP, WSDL, BPEL, and UDDI standards. The advantage seemed to be that there was a lot of existing code. On the downside, the resulting overall system could be expected to become like a tectonic plate: huge and slow. There are many inertial components involved: Java, XML, OWL, Web...

A central ontology registry and workflow manager was first planned to be implemented using XML based web service techniques ("Big Web Services", including UDDI and BPEL). But a centralised registry architecture was subsequently abandoned as too complex and fragile. Lucky for TF, for by now UDDI seems dead and BPEL is not much better off. (The Axis2 WSDL service layer is not doing much work either, to tell the truth.)

The following was an early cast for a TF web service system design: A QueryForm client forms a sparql-dl query from user input. A TF service contains an extension of the Pellet reasoner turned into a web service. The reasoning engine would use UDDI to map graph and term URLs on TF instance URL's. As a reasoner proceeds, it sends subqueries for data through the BPEL service or to one or more ``local'' Query Services or to other TF repositories. The BPEL service schedules and optimises such subqueries and relays the results back. The TF service aggregates the results and returns them to the client. Labels like QueryForm, BPEL and UDDI are only suggestive for the type of functionality meant. Now much of the envisaged cataloguing work has been relegated to TF naming conventions and location mappings.

The TF needs that first seemed to call for centralized cataloguing and orchestrating services include these:

  1. How to tell which TF concepts belong to which TF ontology
  2. How to tell which TF ontologies are managed with which TF repository server
  3. How to delegate term queries down the line when ontologies import terms from ontologies managed by other repositories
  4. How to check that a given cached description of a term is up to date relative to the ontologies it imports from

On further thought, it started to appear that many of these needs can be met by applying existing web addressing technology. The key is to conform to standard Semantic Web addressing orthodoxy . The orthodoxy is to make web content associated to a given URL simply available at that location (see e.g. http://www.w3.org/Addressing/ ).

1. An example of the first question is how to locate the TF description of a concept uri like http://tfs.cc/exp/English in the TF repository system. The answer is now relatively simple. It is enough for a web client to send a GET request to the URL http://tfs.cc/exp/English . The request goes to the webserver at http://tfs.cc . Since this server is a TF repository server, it uses the server redirect capability to translate the incoming uri exp/English to http://localhost:8080/TermFactory/query?uri=http://tfs.cc/exp/English . This is a TF query service url which is handled by the local Tomcat application server at the tfs host. Further redirects cen be defined in TF aliases . Such mappings can also register which ontologies should be searched for a given term URI or prefix name (QName).

2. The second question is similarly a non-issue: by Semantic Web addressing orthodoxy , the ontology URL tells where the ontology is.

3. As for TF imports , TermFactory imports are identified by TF URLS. That is, a TF query import URI points to a TF service which returns the result of a query as an OWL model (possibly one cached in the repository database or one saved as text files on the server).

Loading an ontology at a node can indirectly cause relayed queries at any number of other nodes, possibly rounding back to the first node. In the OWL standard, cyclic import (causing mutual or self-import) is allowed and entails identity (equivalence). Cycling is checked in jena by keeping a list of imported ontologies during model load and checking when the url of a model to loacd is already on the list. TF imports are covered, a special case of jena OWL imports.

4. The question about version updates is solved internally per server, as explained in the section on version checking . The main point is that each term query result brings along with it version information about imported ontologies. When a site updates an ontology, it updates the version information saved on the server's ont-policy file, where it gets looked up by the location mapping tfget facility.

So it is turning out that the TF repository network can be built without big web services. The functions of boxes labeled UDDI and BPEL in the blueprint are getting solved with existing uri redirection services running locally in each TF instance. Some background theory and a general outline of the design is given in a separate paper .

In version 1.2, a TF repository network exists as it were implicitly, through TF uri naming conventions. Each TF server operates independently; there is no privileged communication between such servers beyond mutual query service requests. It is not the servers that depend on one another, but the ontologies stored in their repositories. An ontology housed in one server becomes dependent on another TF server through importing some (small) ontology whose uri is maintained by that other server (simply because the imported ontology uri, directly or through location mappings, queries the other server).

Version 1.2 TF repositories work completely independently of one another, communicating only through repository URLs. Adding a new repository does not require any updating in the rest of the network. Removal of a repository just means that some queries will no longer return answers. The only way how a TF repository hierarchy comes about (at the current state of the implementation, when no attention has yet been paid to business aspects), is through the internal logic of the participating ontologies. A top ontology is one which is imported (from) by many ontologies; a leaf ontology is not imported (from) by other ontologies.

(version 0.1) An initial implementation of the TF backend exists. It sends Jena SPARQL queries to a persistent FT Jena OWL database in MySQL over the web using Axis2 POJO (plain old Java object) web service calls. This implements a baby version of a standalone TF repository, i.e. the aqua, green, blue, and red boxes in the following figure.

(version 1.0) An improved implementation of the TF backend exists. It is implemented as a set of Axis2 web services serving a webapp / RESTful web api done with java servlets. The following is the current Axis2 listing of TF services. The orange boxes labeled UDDI and BPEL, i.e. repository network registry and workflow management, remain unimplemented.

TF Services

The TF web service API is implemented as an Apache Tomcat web application backed by a set of Axis2 web services also deployed under Tomcat. The services and the webapp can be hosted in different places. The webapp consists of servlets which process incoming requests and relay them to the Axis2 services. The servlets provide a REST style web API to the services with HTTP GET and POST. The TF back end services can be deployed individually or bundled together into a service group.

These are TermFactory services:

GateService user management and access control
QueryService content query and retrieval
EditService ontology editing
CopyService storage and copying
SparqlService SPARQL endpoint

Axis2 was designed for server-oriented architecture (SOA), or "big web services". Big web services start from the idea of remote procedure calling (RPC). A user of the service would just use a normal programming language API method call like result = method(args). The call is converted a XML document ("bean") which is sent to the service. The return bean is converted back to a return value.

There is a web service definition language (WSDL) that supports and automatic generation of service and client program code in different programming languages from a declarative specification. XML schemas for data formats similarly serve automatic translation of associated data into xml and back for transmission over the web and are used for data validation.

Newer web applications have moved away from RPC style web services and XML to RESTful web resources. RESTful web APIs are accessible via standard HTTP methods by a variety of HTTP clients including browsers and mobile devices. These RESTful web resources are just web urls. Such REST web services have a programming language independent web API that uses HTTP vocabulary. The Web API basically says what HTTP parameters (name-value pairs) to send in and what to expect back. Purist REST use HTTP verbs as method names, less purist ones use request parameters. It remains for the users of the Web API to write the code that uses the web api. (There is an XML schema WADL for automatic REST code generation, but it is waddling against the stream, as people who shun SOA dont go for XML either.)

For what it is worth, TF web services have a WSDL description through Axis. Here is the QueryService WSDL for an example.

The TF AXIS2 webservice API is actually quite resource oriented. Each service has just two operations, setBean and getBean, and all the action is coded in bean fields.

Here is the QueryService WSDL for an example.

Show/hide TF Query WSDL

TF services share many options and services may call one another. For this reason the TF service beans are nested as follows.

class GateBean extends Bean GateOpts String conf String encoding String pass String readPermit String status String token String user String writePermit class Bean String answer String message boolean dry class QueryBean extends EditBean QueryOpts String bridge int depth String describeQuery boolean displayQuery String engine String input String interpretation String joiner int limit int offset String[] queryLines String queryText String queryType String[] repos String splitter class EditBean extends CopyBean EditOpts String op class CopyBean extends MainBean CopyOpts String contents String downAddr String upAddr class MainBean extends Bean GateOpts HTMLOpts html String active String edits int indent String lang String lion String lynx String menu String original String root String schema String skin String style String template MainOpts String format String level String lex String map String prefixes String factor class Bean

GateService

GateService is consulted by the webapp servlets and the other TF web services to check authorization. TF users and their permissions are maintained in LDAP using OpenDJ DSMLServlet under Tomcat. TermFactory users are maintained through the webapp (GateForm servlet and gateform.jspx).

TF read and write permissions are expressed as regular expressions on repository URLs. For instance, permit http://tfs/cc/.* matches all TermFactory URLs. Users have read or write access through TF on resources which match their permits. The TF permissions only concern repository access through TF services. Beyond that, and independent of TF, outside repositories may of course have their own access restrictions.

GateService implements the setGate and getGate operations. These operations send login parameters and receive an access token, or send an access token and receive user information and read/write permissions. After login, the webapp maintains session using cookies. The TF web services maintain session as well given the keep (-k) switch. The gate options are described below.

String conf TF conf (properties file) address
String encoding character encoding (default UTF-8)
String pass password
String readPermit regexp over IRI
String status user status
String token access token
String user username
String writePermit regexp over IRI

TF_USER and TF_PASS specify a default user whose permits are applied when TF queries are made with TF query URLs (for instance, http://localhost:8080/TermFactory/query?url=... ).

The public account does not permit editing of ontologies. Ontology editing must happen with explicit credentials through an edit form or through the edit service.

Main services

The rest are main services. Options common to main services are collected in the Main bean.

Main options:

String factor a resource rename command
String format format of ontology document to return
String level logging level
String lex name of GF domain lexicon
String map address of TF location mapping file
String prefixes address of TF prefix model

HTML options:

String active active ontology
String edits edited entry (name or contents)
String lang HTML output localization language
String lion HTML output localization vocabulary
String lynx HTML output hyperlink map
String original original entry (name or contents)
String root HTML output root filter
String schema HTML output reasoner schema
String skin HTML css/js
String style address of TF profile (properties file) for HTML settings
String template HTML output template

QueryService

QueryService serves queries against file or database TF repositories using TF Get and Query.

QueryService implements the setQuery and getQuery operations. These operations set query bean parameters sent to the service and get them after the query has been carried out, respectively. The fields of the query bean are described below.

String bridge name of ontology to use for stacked engine reasoner schema
int depth depth of recursion in TF DESCRIBE queries (negative value stand for unlimited)
String describeQuery address of TF DESCRIBE query (overrides default)
boolean displayQuery display query text or no
String engine name of query engine to use
String input input text (table)
String interpretation; assignment of substitution parameters to inputs in boilerplate query
String joiner regex to split input columns by
int limit max size of query result (rows/triples)
int offset number of query results to skip from beginning
String[] queryLines resource name/s or address/es
String queryType; DESCRIBE, TRY, URL, URI, ECHO, QUERY (default)
String[] repos array of repository names
String splitter; regex to split input rows by

EditService

An EditService edit operation works on triple sets. A matrix triple set is edited by subtracting a delendum set and/or adding an addendum set to it. EditService implements four main actions: delete, add, edit, and save. edit bundles delete and add in one transaction.

The EditService parameters are conveyed in an Edit bean with two operations setEdit/getEdit and the following fields.

String op; string parsed as edit operation (a query parameter string)

The operation carried out is given in the operation field, matching the operations described for tfedit . The default opcode (assumed when no explicit opcode is given) is ed . For accessing the edit service with a URL see EditForm .

CopyService

CopyService provides TermFactory storage management. It offers various ways of storing a given ontology file (ontology or entry file) under different descriptions. In particular, it manages indexing of resources by URI path in TF WebDAV directories.

CopyService options are described below.

String contents contents to upload to a store
String downAddr address of store to download/copy from
String upAddr address of store to upload/copy to

The operation depends on the parameters present. If downURL and upURL are given, the document/model at downURL is uploaded to upURL. Else if contents and upURL are given, the contents are uploaded to upURL. If downUrl only is given, contents are downloaded from it.

SparqlService

TF SparqlService is a SPARQL 1.0 protocol compliant query endpoint for TF under Tomcat Axis2.

The w3c sparql protocol recommendation defines the sparql webservice protocol by way of a WSDL 2.0 schema document sparql-protocol.wsdl . The protocol also specifies valid query requests, results and faults (errors) with xml schema protocol-types.xsd . (This schema imports further schemas for its subelements).

TF Visualizer

An elementary version of a TermFactory graph visualizer exists. The core graphics and web service code was written by Seppo Nyrkkö. TFVisu has now been integrated in the TF web service backend architecture. The visualizer service allows reading ontology URL's in TURTLE format and choosing seed resources whose neighborhood(s) the tool visualizes as a RDF style labeled line-and-circle diagram (cirles represent nodes, lines arcs). Images requested from the visualizer are currently cached into a sql database held at the web service client side.

Given two URL's representing two versions of the same ontology, TFVisu shows their join and differences in three diagrams. Example:

Show/hide Visualizer

TF servlets

The servlets of the TermFactory web application constitute sample clients for the TF Axis2 web services. In turn, they provide a RESTful endpoint for user agents and other front ends to access the services. There are three servlets in the TermFactory webapp.

In the demo implementation, the webapp is a collection of Apache Tomcat servlets accessible at address http://localhost:8080/TermFactory/ . The url http://localhost:8080/TermFactory/query starts an interactive form. In Tomcat, the TermFactory.war archive file gets extracted to the Tomcat WEBAPPS directory at deployment. To hot update the application, one can edit or replace files in this directory.

Following is a listing of request parameters understood by TF servlets beyond those already described for the commandline utilities For those parameters that allow multiple occurrences in a query string (currently repos and factor), all supplied values will be applied. For the rest, the last occurrence wins.

Main form parameters:

option short form description values
page response page type form, page
button submit action
ui user interface language

GateForm servlet

The GateForm servlet provides access to the TF gate service that handles TermFactory user management. Without parameters it serves a login form at the webapp URL TermFactory/gate .

TermFactory QueryForm and EditForm forward to the GateForm servlet to check user credentials. GateForm provides a login form at the webapp URL TermFactory/gate . The form checks credentials and starts a session for the logged in user. The settings page shows some login information of the logged in user and provides a form for a logged in user to choose a configuration file .

QueryForm servlet

The QueryForm servlet provides access to the TF query service . Without parameters it serves a standalone query form at the webapp URL TermFactory/query . The same URL with query parameters tagged on can be used to make a query or open the form with given intitial settings.

A TF QueryForm url without parameters opens QueryForm with default settings. The form can be started with user defined initial settings using a url like http://localhost:8080/TermFactory/query?page=form&... where the subsequent parameters are some of the TF query parameters.

QueryForm parameters
Sample TF QueryForm query string parameters

Here are sample TF query string parameters .

Query string Explanation
?url=http%3a%2f%2ftfs.cc%2fowl%2fTFS.owl Fetch TFS.owl using local TermFactory service's location mappings
?uri=ont:ctryCode Fetch or construct an entry for TFS instance uri ont:ctryCode
?q=http%3a%2f%2ftfs.cc%2fsparql%2fconvert.sparql query using sparql script convert.sparql
?q=DESCRIBE+%3finst+WHERE+%7b+%3finst+rdf%3atype+ont%3aConcept+%7d query using script DESCRIBE ?inst WHERE { ?inst rdf:type ont:Concept }
?engine=SPARQL use SPARQL query engine
?format=HTML return results in HTML format
?active=http%3a%2f%2ftfs.cc%2fTFS.owl markup contents of TFS.owl editable in HTML entry
?original=http%3a%2f%2ftfs.cc%2foldTFS.owl markup contents of oldTFS.owl deleted in HTML entry
?lynx=http%3a%2f%2ftfs.cc/etc/maps/lynx.n3 map resource hyperlinks in HTML using this location map
?lion=http%3a%2f%2ftfs.cc%2ftf-TFS.owl get HTML entry localization strings from this URL
?lang=fi localize into Finnish
?W=relabel use TF factor utility to relabel resources in the result model.
?W=deblank use TF factor utility to remove blank nodes from the result model.
?W=reblank use TF factor utility to restore blank nodes to the result model.

EditForm servlet

The EditForm servlet provides access to the TF edit service . Without parameters it serves a standalone editor at the webapp URL TermFactory/edit .

A TF EditForm url without parameters opens the edit form with default settings. The form can be started with user defined initial settings using a url like http://localhost:8080/TermFactory/edit?page=form&... where the subsequent parameters are some of the TF edit parameters.

The basic edit operation of the edit service can be requested through the EditForm with query parameters as follows:

http://localhost:8080/TermFactory/edit?active=test.tf3&original=test2.tf3&edits=test3.tf3

The request does the edit operation which returns the result of editing active model test.tf3 with original test2.tf3 and edits test3.tf3 as a HTML document. The order of the parameters is arbitrary in this request format.

More complicated edit requests can be made by specifying a stack of desired edit operations with query parameters p and op (currently one of del,add,ed ). The operations are carried out in reverse Polish fashion:

http://localhost:8080/TermFactory/edit?&p=.&p=.&op=del&p=.&op=add

This deletes the model indicated by the second p parameter value from the first and then adds the third one to the result. The edit operations can be interspersed with factor operations in the form of rw=<operation> . Each factor pops the model on top of the stack and pushes the rewritten model back in its place.

EditForm parameters

Following is a listing of query string parameters understood by EditForm.

option short form description values
button submit action
user TF username
pass TF password
token session token
encoding character encoding UTF-8, UTF-16
format f graph output format
factor W factor task
input i input document or resource TF name or address
engine e query engine SPARQL, ARQ, PELLET, Mixed, Stacked
original O HTML writer original
edits E HTML writer edits
template t HTML writer output template
root R HTML writer output root/s
schema S2 HTML writer schema
active A HTML writer active ontology
lion L HTML writer localization terms
lang l HTML writer localization language
lynx H HTML writer hyperlink map
q query address
downAddr download address
upAddr upload address
op query string
page response page type form, page
ui user interface language

The EditForm doQuery action implements two of the query types defined in QueryForm, viz. query for a document by address, and query for one or more named resources (preferably by address, failing that, by description from given repositories). The doSave action saves edit area contents at the download address, the active address, or the input, in this order of preference. The doCopy action copies content to upURL (upload address). The doCheck action validates the contents of the edit area (roundtrips the html through rdf). The doEdit (commmit) action updates the active ontology with the edits. The active ontology must be a writable address.

The difference between doSave and doEdit is this. doSave tries to replace the model at the save location with the edits. The save location is the first one given of the download location (downURL), the active location (active), or the input location (input). If the save is to the active location, the active triples of the edits are saved. doEdit updates the active model (active) with the difference between the original and the edits, deleting from it the triples deleted from the original and adding to it the triples added to the original.

Note that TF models may import other models. When a named model is fetched from a dataset with readAll flag set (form checkbox 'no imports' unchecked), its imports are also fetched from whatever location they may be redirected by the applicable location mappings. If the imports closure is saved back to the address where the named model came from, the imported triples get added to the named model. This may not be what is wanted. In the editor, this can be circumvented by setting the dataset named model address as active before the fetch. Then the HTML reader marks the triples coming from the imports as readonly, and omits them from the save.

With appropriate location mappings, a write-only input address can be shadowed by a writable address keeping the edited versions. (To revert to the original version, query the original location with location mappings turned off.)

TF front end

The TermFactory front end includes the TF front end toolkit and the demo site.

TermFactory toolkit

The TermFactory toolkit is meant to help add TermFactory tools to existing collaborative work platforms. It includes a demo installation of TermFactory on MediaWiki.

TF tries to minimize TermFactory specific adaptations of Semantic Web tools. The adapted tools should remain usable for their original purposes as well, and unadapted third party tools should remain usable for TF. Here are a few cases in point:

  • The TF editor is implemented as plugin on the open-source javascript wysiwyg editor CKEditor .
  • TF HTML entries are HTML5 and conform to HTML. CKEditor is a generic HTML editor. Thus i can be used to edit any HTML content besides TF entries.
  • TF entries are a special case of RDF (OWL) models. The back end is able to convert any RDF (OWL) model into HTML. Thus the TF Editor can be used as a general purpose RDF/OWL ontology editor.
  • The TF default term entry template is a special case of entry template. The editor can be adapted to other formats, like the WordNet lexicon format, by changing editor and back end parameters.

The web toolkit as it appears in the TermFactory demo front end is shown below.

Front end toolkit

The current version of the front end is meant to help build plugins and services for third party platforms.

MediaWiki is being used as a testbed.

Show/hide TF toolkit

TF toolkit
TF Webapp

TF webapp

This section describes the web application accessible from the webapp root address http://tfs.cc/TermFactory/ .

The index page of the TermFactory web application - the factory yard - is a decorated link list to the various parts of the app and other parts of the TF front end: the gate, some TF query urls, the query form, the editor with different settings, the TF Mediawiki, and this manual.

Show/hide TF yard screenshot

TF factory yard

TF webapp live

GateForm

The gate form

TermFactory user management is described above and detailed in the section on the TF gate service . The GateForm servlet in the webapp provides a restful endpoint to the gate service.

TermFactory QueryForm and EditForm forward to the GateForm servlet to check user credentials. GateForm runs a login form from the webapp URL TermFactory/gate . The gate form checks credentials and shows some login information of the logged in user.

Show/hide TF GateForm login screenshot

gate login screenshot

The minimalist GateForm login form only has slots for username and password and a button for changing the locale of the user interface.

Show/hide TF GateForm userinfo screenshot

gate form userinfo screenshot

The GateForm user info screen shows information related to the currently logged in user, including read and write permits to TF repositories. The bottom part of the screen shows the TF conf in use, the non-default values of settings from that conf. It also contains a form to change user password and another to select a user conf. A conf may be named or it can be selected from a list. The options in the list are read off of the conf collection whose address is given as property TF_CONFS in the current conf or the user's conf collection in the dav by default.

Show/hide TF GateForm admin section screenshot

gate form admin section screenshot

For a user with admin rights, The GateForm shows an additional section with forms for adding, and deleting and listing TermFactory users. When a user is added to TermFactory, he is given a home directory in the TermFactory DAV subject to the same user/pass credentials.

Admin scripts

The GateForm administrative section contains a form for running TermFactory maintenance scripts from home:/etc/scripts over the web. Currently, the collection contains script wiki.script that generates TF mediawiki pages from TermFactory listings generated with sparql queries. Sample title listings queries are ont2wiki.sparql and exp2wiki.sparql stored in the etc/sparq directory. When a script is run, its messages are show in the message line and its output is shown on the gate form just under the Run script section.

Currently, the TF user directory is maintained with the OpenDJ open source LDAP directory service and accessed using the DSML Servlet deployed in Tomcat.

GateForm live

QueryForm

The query form

The TF query form page exemplifies the functionalities of the query service.

Show/hide query form screenshot

screenshot of query form

With parameter page=form , the TF QueryForm servlet answers a query form. Parameter ui=fi produces a Finnish version. Other parameters too become initial settings for the form.

Command scripts

The TF query form runs besides sparql queries also command scripts written in the apache server scripting language php. Users' command scripts are kept in the same etc/scripts directory as the sparql queries, and named with file extension .script . A source script is cloned (linked or otherwise) to an executable php file with extension script.php . The scripts are executed by the server hosting the user WebDAV home directories.

Command scripts can be run as such or with inputs. Inputs to command scripts are given just the same way as inputs to boilerplate queries. Just like queries, scripts can be written to take default parameters (INPUT1, INPUT2 ,...) and/or named parameters. TF settings for a script can also be specified with the query form settings page. Advanced options such as prefixes and describe-query are not available from the settings overlay, but they can be set in TF conf.

An example of a TF command script is menu.php.txt that runs a set of queries to populate TermFactory menu. It is described in the workflow section .

The query form has a whole page of hidden settings. This part gets overlaid on the form from the Settings button. The settings are described in the query form legend .

Show/hide query settings

screenshot of query settings

The query form legend accessible from the form by clicking the Guide link twice explains the layout and functions of the TermFactory Query Form.

EditForm

The editor

Besides exposing the functions of the TF edit service, EditForm serves a standalone editor.

Editor live (local)

Show/hide editor screenshot

screenshot of editor

The legend accessible from the form by clicking the Guide link twice explains the layout and functions of the TermFactory editor.

On Query, the editor fetches a prefabricated TF document to a textarea editor (the URL button), or make a TF describe query for a resource URI (the URI button). The results of the query are shown in the editor in TF xhtml format. If no active model is set, the input is shown as is. If an active model is set, resources and statements found in the active model are boldfaced and the rest are greyed as read-only. If a localization language is given, resource labels in the entry are localized in that language using an available localization vocabulary.

On Check, the edits are read off the text area and written back to it. With luck, the content is what was intended, though the layout may look different.

On Save, the edited version is saved to the download location if any, the active location if any, or the input location, in this order of preference.

On Commit, the changes are carried over into the active model. The point is that the contents in the editor may come from many ontologies, as the result of a multi-repository or multi-site query. The extra information in the query result may be helpful to decide how to edit the active content, but only the currently active ontology is subject to change on the basis of the edited active content. This makes it possible to edit selected parts of a large ontology without dragging the whole ontology into the editor.

The edit form contains, besides the term edits textarea housing the CKEditor, a menu of further settings. This part of the form pops up from the Set button on the CKEditor toolbar.

Show/hide edit settings

screenshot of edit settings

The settings menu is a CSS overlay exposing a hidden part of the the edit form. Its looks are determined by the edit form's stylesheet editform.css .

The controls of the menu are described below.

no routing Check this to disable TF alias
no imports Check this to exclude ontology imports
input URI/s of the document/s or resource/s to edit
template This and the other HTML options below are explained here .
schema
active
lion
language
hyperlinks
Query The Query button carries out two types of TF queries, URL and URI queries, and writes the results of the query in the edit area. Use the QueryForm for other query types.
document/s Push this button to fetch one or more documents (URL query)
resource/s Push this button to generate an entry for one or more resources (URI query)
Check This option tries to read the edits into a RDF model write it back to HTML. It can be used to check that the edits are valid and express what they were meant to.
blanks This option can be used to factor blanks in the edits as fresh constant URIs (remove), factor such URIs back to blanks (restore), or to generate descriptive TermFactory URIs for anonymous concept/term/expression instances (factor).
Save With Save, the results of edit actions are written to the download address (provided it is a writable TF address).
Copy Copy button copies the contents of the download address to the upload address (provided it is a writable TF address).
Cancel

The dynamic repositories defined for TF can be used as a persistent cache to hold edits between edit sessions. If the editor sits on some collaborative platform like Mediawiki, shared versions can be saved on the platform.

The subsequent options from template to hyperlinks instruct the HTML writer . They are usually defaulted so that one need not set them. The defaults are shown in the menu. Unless set by user, they are fetched from the entry's meta triples or html header.

The radio button provides the first two options of the TF query form. The document button does a URL query, i.e. uses Get to fetch a document by name. (If TF routing is used, the name can be anything defined in the location mappings. If not, it should be an accessible URL.) The resource button calls for a URI query for the resources whose names are listed as input. The editor Query button currently only provides these two query types. When a more complex query is called for, use the query form and save the results somewhere to fetch into the editor.

The Commit button sends the contents of the editor to the edit servlet that relays it to the EditService back end to be committed in the repository at the active address. It is best to first check the edits with Check to be sure that only intended changes get committed.

The blanks select menu gives choices for rewriting blank nodes in an entry. Option deblank replaces blank nodes with blank resource names of form urn:blank:... . This gives the blanks a temporary persistent identity during editing. By the RDF standard , RDF reads/writes do not preserve blank IDs , and updating rdf containing blanks is a complex problem. Option reblank factors blank URNs back to normal blanks. Option relabel tries to use the TF factor utility to invent descriptive labels to terms and expressions according to the TF descriptive label naming convention .

The Save button uses CopyService to upload the edits to the given upload location (a TermFactory web directory URI).

CKEditor

The current X/HT/ML editor used is the CKEditor textarea editor by F. Knabben. The version in use 2013 is 4.1). CKEditor is a wysiwyg editor, i.e. editing of HTML happens through a CSS stylesheet "skin". The Source button on the default CKEditor toolbar shows the underlying HTML. The Save as file button extracts the edits into a document that can be saved locally as a file from the browser. (The Preview button is the same except for icon and title.)

CKEditor is not perfect as an HTML structure editor. Normally CKEditor only shows the visible text. Button Show blocks highlights preformatted text literals. Button Source shows the underlying html. Lists (ul) and list items (li) can also be selected with the breadcrumb path underneath the editor window. To copy-paste a new item on a list, select a list item, copy it (C-V), put cursor at the beginnning of the item to insert after, and paste (C-V) the copy. If the copy lands too low or high, Decrease indent / Increase indent buttons may fix it. CKEditor has a long standing bug about copying table rows. One workaround is to copy a complete table and delete unwanted rows.

Use find/replace to replace all occurrences of incorrect information.

TermFactory extends CKEditor with a TF specific plugin for CKEditor.

It adds to the CKEditor toolbar some extra buttons and a menu.
Set opens the overlay menu for editor settings
Query shortcut to the Query submit button on the settings menu.
Edit shortcut to the Edit submit button on the settings menu.
Upload shortcut to the Upload submit button on the settings menu.
Menu opens the TermFactory menu.

The toolbar shortcut buttons apply the settings in force in the edit form. They just save the trouble of opening the overlay when there is no need to change the settings. The TF Insert menu is also on the CKEditor context menu (right button of mouse).

Since the TF editor is a fully featured HTML editor, it is always possible for a user to enter content anywhere in an entry by just typing it there, or by cutting and pasting content anywhere and then editing it. Once one is at home with some terminology and entry format, that may well be the most convenient way of working. However, should one forget what can go where, or just want to avoid typos, there is the TF Insert menu.

Show/hide TF Insert menu

CKEditor TF Insert menu

The purpose of the TF Insert menu is to help users choose properties and values to insert in a given position. The menu is opened with the current property and/or value selection copied to the input fields. A click on a property opens an autocomplete list of subproperties of the property or value. Further clicking on the selection repeats the process of zooming in until the desired property appears. When a property is selected without value, a click on a value input makes the menu to look for classes in the range of the currently selected property.

A click on the value field opens a menu of subclasses of the current value to choose from. Clicking on the selected value starts the subclassing process again with the selection. If the selected class has no subclasses, hte menu looks for instances of the class. If the selected property is a datatype property, the menu tries to list literal values for the property.

The TF Insert menu has one submit button called Insert and two exit buttons, Cancel and OK. The Insert button uses the input value to query an insertion template from current layout template. The insertion template is a predefined bit of HTML associated to the currently selected value. The Insert button tells TF Insert menu to insert the template as the value of the input property in the property list containing the current selection in the text area. If no template exists for the input value, the input value as such gets inserted. Insertion is undoable (select Undo from CKEditor toolbar or hit Ctrl-Z).

The OK button closes the menu but remembers the current inputs for the next time the menu is opened.

The Cancel button closes the menu and clears the shortcuts. The next time, the canceled tab opens with its startup values.

TF autocomplete lists
instances.json autocomplete list from classes to their instances
literals.json autocomplete list from datatype properties to their literal values
ranges.json autocomplete list from properties to their range classes
subclasses.json autocomplete list from classes to their subclasses
subprops.json autocomplete list from properties to their subproperties

Autocomplete lists can be generated from any ontology using sparql queries in directory etc/sparql/ . The relevant queries include

instances.sparql literals.sparql ranges.sparql subclasses.sparql subprops.sparql

An example command line to produce an autocomplete list for subclasses from TF schema:

pellet4tf query -F -q home:/etc/sparql/subclasses.sparql -f JSON -F2 home:/owl/TFS.owl > subclasses.json

Autocomplete lists need not match any existing ontology hierarchy, they can be customised to whatever seems practical. A hierarchy helps when one does not know the ontology. A flat autocomplete list is enough when one knows what to look for. The customisation can happen at any level: one can prepare custom autocompletion ontologies, custom autocompletion queries, or just custom autocompletion lists.

Users may choose among shared or private menus and skins and point TF to them with the TF conf options TF_MENU and TF_SKIN. The shared ones are in TermFactory webdav. Private ones are fetched by TermFactory query:

TF_MENU /TermFactory/webdav/etc/menus or TF_MENU /TermFactory/query?u=dav:/home/guest/menus TF_SKIN /TermFactory/webdav/etc/skins/tf2html or TF_SKIN /TermFactory/query?u=dav:/home/guest/skins/foo

Besides confs, menus and skins to use to edit a given TF document may be specified in the document with TF meta triples of form

[] meta:menu </TermFactory/webdav/menus> . [] meta:skin </TermFactory/query?u=dav:/home/guest/skins/foo> .

A meta element in the document overrides the conf setting.

Menus and skins must come from the TermFactory site to conform to browser same-origin policy. Users can apply their own menus by using a TermFactory query url as proxy. See the second sample TF_MENU setting above.

Site wide menus and skins for the TF editor are backed up in source directory $TF_HOME/ws/servlet/etc/menus . The default menu is at the root of this directory, others are stored by name in its subdirectories. Some site wide menus get copied to the webapp jar from the webapp directory /TermFactory/etc/menus at build time. Others can be added runtime using the TermFactory webapp's Tomcat webdav servlet. You need a Tomcat webdav password.

The systemwide menus and skins are at the TermFactory webapp runtime resources webdav url /TermFactory/webdav/etc . Administrators can edit shared menus and skins in the TermFactory webapp through the TermFactory webdav url /TermFactory/webdav/etc/menus/ .

Show/hide TermFactory webapp dav

TermFactory webapp dav

TF inserts

The Insert menu allows inserting canned RDF content in the edit area at cursor by name from the current layout template. Inserts are queried from the layout template using a TF query and formatted to match the current skin. The template query makes a DESCRIBE query for the content of the value field in the menu. What gets inserted depends on the current selection. If it is at a value, a value is inserted. If it is at a property, a new property is added.

TF front end localization

The TF dialog front end, including the insert menu, can be localized to the current interface language. Both the interface texts and the autocomplete options get localized. The dialog localization happens by reference to the json file named in javascript variable TFConfig.menu.lion . (By default, it is etc/menus/lion.json .)

TF menu content generation

One of the points of TermFactory is that it can be adapted and localized for new ontology schemas and languages by simply editing some more ontologies with it. This section describes one way of facilitating the process. As an example, we use the TermFactory location mapping file format. A TF alias (lm) file is a RDF file of mapping rules. An xhtml layout template the lm format is defined in etc/templates/lm.ttl . The vocabulary and semantics of location mappings are described in the schema etc/templates/lms.ttl . The schema vocabulary is localized in etc/templates/lion-lms.ttl . These documents form the basis for generating the TF menu contents for editing location mappings in TF.

The CKEditor plugin is in fe/TermFactory/js/term.js . The TF Insert menu is defined in fe/TermFactory/js/dialog.js .

There is a tutorial on writing CKEditor plugins in Woofie .

In ckeditor (v. 3.5.1), css color and text-decoration settings for html anchor elements are declared !important (in ckeditor plugin about.js ), with the effect that anchor elements in user content always show with default formatting (blue underlined). A workaround is to supply user-defined anchor settings with css!importantdeclaration as well. See etc/tf2xhtml.css for an example.

It is simplest to use absolute uris to link style files to content edited in the ckeditor. In the ckeditor instance bundled in TermFactory webapp, relative uris resolve to the webapp's root directory ($CATALINA_HOME/webapps/TermFactory/).

CKEditor and its predecessor FCKEditor have a built in template facility that can also be used for TF purposes.

TF resource collections

The TermFactory front end is customised for different users and tasks using resource collections of configurations and scripts. This section documents them.

TermFactory collections

TermFactory collections are used to organize different types of documents for access through the web application. Here is how to turn a web directory into an indexed collection. Create a directory, say confs/ , put a copy of php script home:/etc/apache2/idx.php in the directory and point TF_CONFS to the directory (equivalently, to the index file). In this case, the TF webapp produces a localized index of the expected form for the sparql scripts in the directory. For a filesystem directory, its TF collection is looked for in a hidden file .idx in the directory. Below is an example of the TF collection index file format. The local name of the item is separated by a tab from a legend. A legend is shown as is except when it starts with hash. In that case, the legend is is split at language tags and the legend in the current user interface language is shown.

services.idx #en common services#fi tavallisia palveluja#sv vanliga tjänster stores.idx #en common stores#fi tavallisia varastoja#sv vanliga förråd stores2.idx #en other stores#fi muita varastoja#sv övriga förråd

TermFactory collections can also be virtual. Virtual collections are documents with extension .idx having the above format, namely two-column tab-separated table with url on the left and legend on the right. The urls may be absolute or relative to the pathname of the virtual collection. Collections of type .idx can contain absolute addresses or aliases resolved by TermFactory, while collections of types .idx.php or / are directory listings which are resolved by the directory URL.

Index file home:/etc/apache/index.php placed in a web directory allows viewing its TF collection in a more human readable way. A collection index may look like this:

Show/hide collection index

TF collections (.idx files) and listings (.tsv files) are by format tab-separated values (.TSV) files that can be generated by TF queries. While .idx files are used for browsing and manual selection, .tsv files are fed as input to queries. If a listing file has multiple columns, all but the first column are ignored, so collections may be used listings by just changing the file extension. The script etc/apache2/idx.php creates a TF index file for a filesystem directory with commandline command php idx.php > .idx.

The TF WebDAV directory

The TermFactory WebDAV directory is a shared editable place for users to keep ongoing work and TF conf settings. Each user has password protected access to his/her own editable subfolder dav/home/user on the DAV webserver from anywhere. The root dav directory is seen only by the dav root user, but its subdirectories named by scheme (file, http, ...) are readable by all. URLs saved in dav are indexed by pathname in the dav. The DAV directory can be accessed with a WebDAV capable client, for instance Nautilus filemanager in Linux or Windows Map Network Drive. The location of a TermFactory site's default DAV directory is given with TF_DAV . Its initial value is http://localhost/home/TF_USER. TF substitutes the current user name for TF_USER in the url when accessing a DAV directory.

The option TF_CONFS tells the webapp where to look for conf files. The value of this setting is supposed to be a php file (extension .php ) that constitutes an index to a conf collection. The format is lines of urls or relative filenames of properties files, each optionally followed by a tab and a verbal description of the conf. Relative filenames in an index are resolved against the path of the index. With index files one can keep many conf collections in one directory or create an index that spans more than one directory.

The option TF_SCRIPTS tells the webapp where to look for sparql scripts. The value of this setting is supposed to be a php file (extension .php ) that constitutes an index to a script collection. The format is lines of urls or relative filenames of script files, each optionally followed by a tab and a verbal description of the script. Relative filenames in an index are resolved against the path of the index. With index files one can keep many script collections in one directory or create an index that spans more than one directory.

The option TF_MAPS tells the webapp where to look for location mappings.

The option TF_MENUS tells the webapp where to look for editor menu configurations.

After login, the gate form looks for a default conf for user USER at address TF_DAV_URL/home/USER/etc/confs/USER.properties The conf is loaded if found and the user's conf collection (TF_CONFS) and sparql scripts (TF_SCRIPTS) are looked up from it. If not, system defaults are used. The default layout of a user's TF configuration files is the following.

Show/hide DAV tree
etc/ aliases/ foo.n3 bar.n3 ... menus/ props.json foo.html bar/ props.json bar.html ... confs/ foo.properties bar.properties ... queries/ foo.sparql ... bar/ bar.sparql ... skins/ foo.html foo.js ...

Make note that menus and queries have the default set at root of the directory and alternative sets in subdirectories. This is only a menu suggestion, alternative arrangements work too. One can use any accessible web place to save TF configurations.

In order of decreasing generality and stability, there are the following places to look for TF configurations:

  1. TermFactory site filesystem TF_HOME home:/etc
  2. TermFactory webapp root TF_HOST/TermFactory/webdav/etc
  3. TermFactory WebDAV root TF_HOST/dav/etc
  4. TermFactory WebDAV user home TF_HOST/dav/home/user/etc
  5. any other web accessible place

Canned queries

A TF conf points the query form to a collection of canned command and sparql query scripts at the address given in its TF_SCRIPTS property. By default, it is the user's dav home subdirectory etc/scripts/ . Names of canned queries are not right or wrong, only more or less convenient for human users. Frequently used canned queries can be more conveniently nicknamed with TF aliases. The query form scripts pulldown sorts scripts by comment line. In addition, a script can be tried out by downloading and querying it with default settings specified in the query header.

Sample TF sparql scripts are collected to the TermFactory sample sparql script directory home:/etc/scripts . The sample scripts obey the naming convention explained above.

Canned query headers

The TermFactory query form uses comment lines at the head of a script to build the script collections shown on the form. The first line of the script header is by convention the title (local name) of the script. It is not used by the query form. Subsequent lines starting with the comment start hash immediately followed by a language code plus whitespace are used to form a localized pull-down list of scripts described and sorted by the comment line in the user interface.language chosen for the form. The full url of the selected script is shown in pulldown option's tooltip. The scripts can be retrieved by name in the autocomplete list of the names/addresses text box.

Show/hide query script pulldown menu

screenshot query script pulldown menu

Canned query settings

A comment line starting with # followed by a TermFactory query string (e.g. # ?I=foo) is used by QueryForm to fill out default settings for a sample query with the script. The query form button Load loads the settings from this header line into the query form fields, so that the user can try out the query directly, and sees what parameters she might want to change. The query settings line can be used to give default parameters to boilerplate queries. Another way to obtain the same effect is to write a plain query and use it as boilerplate with the defaults as boilerplate parameters. The settings line can be URL encoded or plaintext (it will be urldecoded when parsed).

TF Wiki

There is a MediaWiki extension TFTab for embedding TermFactory in MediaWiki. The TFTab extension adds to MediaWiki a tab that embeds the TF webapp in an inline frame (iframe) on a Special page. When the tab opens, it shows by default the TF editor with the title of the mediawiki page entered as the query address. The TF entries and edits are maintained in some TF repository outside of the wiki or they can be copy-pasted from or to the wikipage.

Show/hide TF Wiki

TF wiki

TF Comment

The TF wiki platform, as well as The TermFactory webapp components, are set up to receive and generate Disqus comments, best threaded by TF resource uri given in the wiki page title. In disqus terms, the TF resource uri is used as the disqus identifier instead of hosting page url. In this way, comments on the same resource coming from different platforms get collected together and can be reviewed as a group.

Naming canned scripts

A good script name describes the query it names. Script names may loosely follow the SPARQL syntax:

QUERY-MODIFIER-TYPE-RESTRICTION-FROM-LIMIT where QUERY indicates query type or purpose (e.g. collect | construct | count | describe | list | replace | select | ...) TYPE reflects the result of the query MODIFIER and RESTRICTION pre and post modifiers reflect the WHERE clause of the query FROM = from-X(-and-Y)... reflect the FROM clause of the query if any LIMIT reflects sorting and limiting modifiers if relevant

Some sections of the name can of course be empty. The query type collect is used for CONSTRUCT queries which select a subgraph of the graph pattern in the WHERE clause,and type list for one-column SELECT queries. Type extract is for scripts that extract information from resource names.

Populating the wiki

The TF MediaWiki demo installation comes prepopulated with some sample TF term ontologies, including the en-fi WordNet and the ICD-10 nomenclature in six languages (en,de,ru,sv,fi,la).

Populating the wiki means that wikipage templates are generated for relevant term resources in an ontology, so that the wiki can be used as a search index to the entries and as a platform for collaborative work on those resources.

The wikipage template at present is just a link list to external dictionary and terminology resources s.v. the resource in question, including a link to the TermFactory editor that loads an entry for the resource from a TermFactory repository. The entry can also be loaded through the TermFactory TFTab.

Below is a workflow for generating from an ontology a listing of the relevant resources in it, and sample scripts for generating wikipages for the listed resources. The example uses the en-fi wordnet.

  1. load ontology to an owlim repository.
    cd $TF_HOME/owlim/ drop-owlim owlim-se-wordnet ./owlim-loader.sh config=$TF_HOME/etc/assemblies/wordnet.ttl preload=$TF_HOME/owlim/preload/wordnet directory=$TF_HOME/owlim context=
  2. generate listing of resources to import to mediawiki from the repo
    cd $TF_HOME/fe pellet4tf query -F -f2 TSV -N -1 -q wnlabels-tsv.sparql wordnet+ > wnlabels.tsv
  3. run mediawiki import script with $filename set to wnlabels.tsv
    cd $TF_HOME/fe php import-wordnet-tsv.php

The generated mediawiki titles do not correspond one-to=one to the TF iris, because the RDF IRI and Mediawiki title reserved character sets are different. Restricting titles to their intersection would make wiki titles and iris look less legible than each of them have to be separately. Instead, a bidirectional mapping is defined between TF iris and Mediawiki titles in the wikipage import script to maximise legibility on both ends.

QueryForm Legend

This legend explains the layout and functions of the TermFactory Query Form. More detail is in the guide. When the guide is open, a Ctrl-click on selected form elements scrolls the guide to an appropriate place.

The form has three fieldsets, Query , Copy , and Comment .

The Query fieldset offers a variety of query types, executed by corresponding buttons. When the pack checkbox is selected, the query is not executed but a query url is formed out of the form parameters and placed the download field. The address is also shown in the message area.

Map is for resolving one or more TermFactory addresses. Each address can be a URL or an alias that gets mapped to a URL by the TermFactory location mapper.

Get is for downloading one or more documents from TermFactory addresses.

If there is just one document whose format does not matter, the document is downloaded as such. Then the document can be any format.

If there are multiple documents to get, the documents are read as RDF and merged together (with their imports, if indicated). The merged result is converted to the format selected in the Settings.

Describe is for creating entries (descriptions) for one or more resources by name from repositories given below.

The entry (description) is queried from repositories specified in the parameter field or in the repos textarea, using the TermFactory DESCRIBE query facility. The result is shown as a page in requested format.

Query runs a sparql query or a php command script.

Names of resources to describe, addresses of documents to fetch, or scripts to run are entered in the names/addresses input field.

More than one item (separated by joiner) can be entered here. Names are autocmpleted from the current script collection.

Addresses of documents and scripts can also be selected from a pulldown menu.

The pulldown menu shows the current conf's collection of aliases, indexes, commands and query scripts. The pulldown sorts items by description. The addresses are shown in the tooltip. Type of item (collection, command script, graph, table) is indicated by icons. The Pick button copies the selected address to the address field for inspection or editing. The Load button loads the script or query from the selected address. The script is loaded in the query text area if it is a sparql query. Command scripts cannot be run from text area. A script can be tried out by downloading it and querying it with the default settings written in the query header.

When there is a choice, an address entered in the address field wins over aa selection from the pulldown, and a query text entered in the text area wins out.

If a SPARQL query has FROM (NAMED) clauses, they are used. Otherwise, repositories to query are entered in the repositories field.

One datastore or endpoint, or one or more graphs separated by joiner can be entered. A repository address which resolves to a listing (file type .tsv ) is read as listing of more addresses. If there are no repos, the current conf's TF_REPO property is looked up. If a sparql endpoint address (URL ending in /sparql ) is entered in the repository field, the selected query is sent there.

If a download address is given, the result is saved there. If there are several results, they are named by the input string and saved in the download address, which then should be an RDF repository or a writable directory. If checkbox merge is selected, the results from the iterated queries are merged into one result. Otherwise, the results are printed sequentially on the result page.

A datastore to save query results to can be entered in the stores field.

One datastore or endpoint or file repository can be entered. If there is no store, the current conf's TF_STORE property is looked up. If a sparql endpoint address (URL ending in /sparql ) is entered in the repository field, the results are sent there as a SPARQL update query. Otherwise, a page containing the results is returned.

When storing a plain query, the storage location must be the address of a writable RDF graph: a writable file location or the name of a graph in a RDF store. When a boilerplate query generates several results, the storage location must be an RDF datastore or a writable directory. Each result is named by its input parameter and saved in the store,

The inputs and parameters textareas allow iterating any of the four basic query types over multiple rows of input given in the input textarea. The basic query is used as boilerplate.

Inputs to the boilerplate can be given as two-dimensional table or list. The usual case is where the values form a table, each row aligned in a fixed number of corresponding columns, say:

		dbp:Fish fi kala N
		

dbp:Bird fi lintu N

Here, we want two queries, one for each row, each time doing four substitutions, one for each column, so as to produce two query results. The default substitution parameters are INPUT1, INPUT2 ,... but parameter names can also be supplied explicitly in the parameter field, say:

		ont        lang        base        cat 
	      

for parameters occurring in the script or query text.

If checkbox across ist checked, each row lists values for one parameter, and all combinations of values for the different parameters are cross-multiplied and queried in turn.

		dbp:Bird dbp:Fish ...
		

en fi sv ...

This arrangement takes all combinations of a concept on the first row and a language on the second row, producing 2x3 = 6 results, doing two substitutions per query. Optional explicit parameter names form a matching column:

		ont
		

lang

An item which resolves to a listing file (file type .tsv ) is read as listing of more inputs. Explicit row joiner and column separator can be specified in the settings.

The Jobs button reports about ongoing background jobs for the user. The Reset button returns to the settings of the most recent submit. The Clear button clears the visible part of the form (not the settings).

The offset and limit fields define a window to show in the query's result set. The window starts from offset and its maximum size is limit.

The Query button submits the query. If it has results, they are returned as a result page. If not, the form is returned. Messages are written in a message area at the bottom of the Query fieldset.

The Settings button opens an overlay for further choices (query engine, result formats, character encoding, and input punctuation). The current choices for query engine and table and graph output formats are shown up front next to the Settings button.

On top of the Settings overlay is a row of checkboxes.

  • The no routing checkbox disables TermFactory location mapping.
  • The imports checkbox includes documents imported by chosen TermFactory documents to the query.
  • The merge checkbox aggregates the results of an iterated query together.
  • The keep repos keeps the current query engine loaded for faster queries from the same repositories.
  • The overwrite checkbox gives permission for a copy to overwrite an existing version.
  • The job checkbox puts a query in the background and returns to the form. When the job is done, the results get written at the download address. Jobs can be monitored with the Jobs button.

The bridge field is for the address of a bridge schema. It is used to translate a query sent to a SPARQL endpoint to the endpoint's vocabulary, and/or applied by the reasoner of the Stacked query engine.

The TF HTML style parameters (template, root, schema, active, localization, language, and hyperlinks) are defaulted from the downloaded content or from current TF style or conf, but they can also be set individually. An explicit style parameter entails format HTML (unless overridden by an explicit format parameter).

  • template and root choose the orientation of the RDF graph and the resources and properties shown.
  • schema bridges third party vocabulary to TF for HTML layout.
  • active includes the editable part of an entry.
  • original for showing version differences in HTML
  • edits for showing version differences in HTML
  • Resources are labeled by IRI by default, but the label can be localized to the user's language by supplying a language and/or localization .
  • Resources are hyperlinked by IRI by default, but are rewritten with location mapping given in hyperlinks.
  • The tree checkbox shows a graph as a tree by suppressing inverse properties. This option is the default for the editor.
  • The global names checkbox prints full resource names (iris) in the TSV format (the default is to abbreviate names with known prefixes).
  • The factor pulldown tells TF to factor the requested RDF document, replacing names of blank resources according to selection. The plain factor option just reformats.

The graphical look of a HTML document depends on the document's stylesheets or skin . By the default skin, TermFactory concepts are shown blue, expressions orange, and terms green. Active content is shown dark, read-only content transparent. Nodes in the graph can be collapsed by clicking. Resource names are shown when hovered over.

TF terms can be written into Grammatical Framework (GF) parser/generator lexicons. The GF output format needs a name for the domain lexicon and optionally the language code of a GF concrete grammar.

Inputs to boilerplate queries are split with regular expressions in joiner and separator between columns and rows, respectively. The default values are inline whitespace and newlines, respectively.

The Copy fieldset has two fields, download address and upload address .

The download address is where query results get saved. Existing repositories are not overwritten unless overwrite is checked.

  • The Download button brings the contents of the download address to the browser.
  • The EditForm button opens the TermFactory editor with the download address plugged in.
  • The Copy button copies contents of the download address to the upload address.

The Comment fieldset inserts a Disqus commenting system mashup. The desired discussion thread identifier is entered to the indicated input field. If it is left empty, a system default is used.

Editor Legend

This legend explains the layout and functions of the TermFactory Edit Form. More detail is in the guide. When the guide is open, a Ctrl-click on selected form elements scrolls the guide to an appropriate place.

EditForm allows editing ontology content in TermFactory's RDF/HTML format using the well known JavaScript textarea editor CKeditor. The editor can be used to edit any RDF content (any HTML or text, for that matter, since the editor is an extension of a generic text/html editor).

RDF/HTML for TermFactory is an HTML representation of RDF triples, Though optimised for term ontologies, it can be used to view and edit any RDF content in HTML.

CKEditor is overlaid on a form textarea where it controls the text entered in the text area. The CKEditor button controls have been left at their default values and are mostly self-explanatory or documented elsewhere.

The main addition is the TermFactory menu bar (buttons labeled Set through Insert) added to the CKEditor menu.

Button Set opens an overlay with more settings. Button Insert opens a popup menu. The rest of the buttons are shortcuts to buttons on the Settings menu, detailed below.

The Insert menu offers a context sensitive selection of RDF property and value templates to insert into the entry. The menu allows users to insert canned RDF properties and values into the edits instead of typing them in.

The property and value fields are autocompletable. Double clicking a selection opens the next lower level of options.

The insert menu content can be adapted to a given ontology schema and localized to the user language. The property and value fields are autocompleted on user input following a user definable taxonomy from more general to more specific concepts. The taxonomy and the canned values are customisable. Both can be generated by script from TermFactory schema content. The menu content can be localized to appear in the user's working language. The menu localization can also be generated by script from TermFactory content.

The Settings overlay opens a query form and further options.

At the Query button, content is fetched to the editor using the TF query back end. The query form is a simplification of the TermFactory query form. There is a radio button for downloading documents and another for downloading or generating entries. More complex queries can be constructed in the query form and brought to the editor as query addresses or saved documents. A new query starts a new editing session unless the same original checkbox is checked.

The editor's default format is HTML. The HTML style parameters (template, root, schema, active, localization, language, and hyperlinks) are documented in the TF manual. Generally, they are defaulted from the downloaded content or from current TF style or conf. An explicit style parameter entails format HTML (unless overridden by an explicit format parameter).

  • template and root choose the orientation of the RDF graph and the properties shown.
  • schema bridges third party vocabulary to TF for HTML layout.
  • active delimits the editable part of an entry. Resources are labeled by URI by default, but the label can be localized to the user's language by supplying a
  • language and/or localization .
  • Resources are hyperlinked by URI by default, but they can be redirected with hyperlinks.

The decoration (colors, fonts, etc.) shown in the editor depends on the entry's stylesheets or skin . By the default skin, TermFactory concepts are shown blue, expressions orange, and terms green. The edits may contain both read-only and active content. Active content is shown dark, read-only content transparent. Nodes in the graph can be collapsed by clicking. Resource names are shown when hovered over.

Edits can be checked using the Check button. If the blanks pulldown is selected, anonymous resources are given temporary or descriptive names, or temporary names are reverted to blanks according to the choice.

The Save button saves the edits at given download address (not on local machine: CKEditor's buttons do that). The Commit button updates the active ontology with the edits, deleting from it what got deleted and adding what got added in the editing. The Diff button shows the changes in the edits from the original. Save the diff to continue from uncommitted edits later.

The Copy button copies the contents of the download address to the upload address. The QueryForm button opens the query form prefilled with edit form settings.

The Comment fieldset inserts a Disqus commenting system mashup. The desired discussion thread identifier is entered to the indicated input field. If it is left empty, a system default is used.

TF system design

The TF system

This figure summarizes the component types of a TermFactory system graph. Each type of component is exemplified with a concrete instance (one actually used in the reference implementation). From right to left, we start with the front end user interface and end with the repository back end.

Show/hide TF system

TF system

Workflows

This section documents workflows and best practices in TF based terminology work. There is a general section on division of labor and the terminology worklow, and a section on how the work happens on different platforms. The latter is further divided to sections on professional terminology tools, The TF wiki platform, and TF panel inserts on other platforms

The increase in user created content and interactivity gives rise to issues of control over the community and ownership of the jointly-created content. This gives rise to fundamental legal issues, such as Intellectual Property and property ownership rights.

Collaborative terminology work vs. traditional terminology

The following table compares traditional terminology work per work phase to the TF workflow. TF methods do not oust traditional ones, but complement them.

Phase Traditional TermFactory
Source collection Books, journals Community awareness, web harvesting
Term candidate collection Perusal of documents Community awareness, web content statistics
Term choice Committee Community voting
Concept analysis Drawing Ontology editing
Term description Dictionaries and grammars Expression ontology
Compilation Text editing Query language
Publishing Publishing house Transformation pipeline

A virtual expert community is a natural accumulation point for links to relevant documentary sources. The TermFactory user base keeps a steady flow of term proposals just by failing to find terms in the repository. Community portals can also be actively harvested for terms not covered by the repository using ContentFactory information retrieval, fact extraction and term / keyword extraction tools. The user base's preferences for term usage can be monitored in the community using original or active voting schemes.

Such statistics can be shown as is (descriptive terminology) or used to support authoritative decisions in grading or (de)selecting term candidates (normative terminology, harmonisation). Term choices can also be evaluated against the expression ontology of the complete TF repository system so as to avoid inter-sector term clashes, accidental homonymy, and to enhance terminological consistency. The primary source for TF terminology is the repository system (or a subset of authoritative servers).

When separate compilations are needed for special purposes, for publication on some restricted channel or the like, the desired subset can be retrieved from the system using suitable repository queries. The results of the queries, being in a structured standard form, can be rendered in desired formats according to publication channel using fully automatic transformation pipelines.

Another table of comparisons can be made according to the roles and responsibilities of the actors in the traditional versus TF terminology workflow.

Dimension Traditional TermFactory
Actors Terminologist(s), subject expert(s), term committee Terminologist(s), subject expert(s), users
Workgroup methods Person-to-person interviews, committee meetings Web community
Schedule and motivation Fixed schedule for a fee Shared interest
Roles and authority Fixed roles and authority Based on shown merit
Languages Few with a fixed definition language Many languages
Sources Predetermined Wide variety
Purpose Predetermined Not restricted
Lifecycle Project duration Continuous

In the traditional workflow, terminologies are made to order by professional terminologists working either in-house (relatively few organizations can afford to have in-house terminologists) or as a paid service by terminology organizations or language services. As a separate process, terminology is not a very profitable business, owing to the high manpower cost of quality terminology and the relatively low priority given to terminoloy by clients. Worldwide, there are not many companies or organisations solely specialising in terminology work; many of the existing ones are small and/or get public funding. The profits from high quality terminology for a company or organisation whose main business is elsewhere are long term and indirect, and usually relegated under some low profile budget item.

It makes more sense to integrate terminology with some more pressing mainstream business item or service. Thus larger language services can do terminology as a part of a wider language service offering, like language training, translation, multilingual content management, or the like.

Part of the high cost of terms comes from expensive expert time. Expert committees are ill attended and tend to converge slowly because terminology disagreements often hide territorial conflicts. While no technology can remove these problems, collaborative terminology platforms can help reduce these bottlenecks by allowing a wider range of opinions and more freedom from calendar conflicts. The fear might be raised that web communities are too amorphous and unruly to contribute reliable input. This criticism uses the free-for-all web communities like Wikipedia as the thought model. Even Wikipedia is surprisingly good at correcting itself, but there is no need to think that TF expert communities are anonymous or free for all. Membership of a given community may well be restricted and the contributors identified. Technology is not the limiting factor here.

An interesting possibility is to use experience-based attribution of authority by the community, a method that works well in many web-based expert communities. An expert whose advice has been useful for a large number of users gains authority points (counting thanks from users, or some other measurable proof of quality). The community can use this information to sort or weigh alternative opinions on a given topic. Although one can imagine many ways how this idea might not work, perhaps surprisingly, it practice it does.

Another possible criticism is that the expert community can only have public discussions. Nothing prevents one-to-one or other private discussion en petit comité among members of on an expert forum, either on the forum or separately, whether real time chat, forum or email style.

It is important to separate technology from its application. A case in point is the question of motivation. Professional terminology work is done for a fee, while one of the seductions of the idea of collaborative terminology work is that it might happen for a shared interest, on a tit-for-tat basis - or for indirect profit or gratification, like the visibility and authority gained from contributing to communities like Wikipedia or LinkedIn. But all this is again independent of the technology. It is quite as possible to build a TermFactory installation working on a fee-for-service basis. For information providers (terminologists, experts) the payment is based on content rendered and approved by the buyer(s). For information consumers (terminology users), the payment could be on a subscription basis, or measured by term downloads. Different user roles, rights and obligations can be defined as is done in many existing collaborative work platforms.

Professional terminologists are trained to follow standard work methods and quality norms in their work. If the work is spread on a larger number of subject experts and users, quality may deteriorate. True; this is one of the areas where TF needs to come up with innovative solutions. Roughly, we need a regimented upward flow in a TF repository system, rather reminiscent of the Wikipedia echelons of quality checking. When term suggestions come in, they are made available to the communmity with a low reliability status. When they have been revised by the commumity and perhaps passed inspection by a select group of experts, they get a higher status. At the end, they may be adopted by the repository authority as part of the "normative" core of the collection. (This process is one where the users too may get promoted to a higher status.)

One crucial quality requirement for professional terminology is source indication. This plays an important role in the subsequent authorisation of terminology. Here, information technology can help a lot. First, the platform may make it easy for users to to add source indications in a way that minimally is traceable to the source and at best follows a given norm (say, using web addresses and/or one of the many bibliography formats in use). Second, terminology suggestions can be evaluated by the reliability and authority of the source and/or the proposer(s).

TFS terminology repositories as a rule are an ongoing concern. This is a major improvement to the status quo, where typically a terminology project is started when earlier terminology collections are hopelessly obsolete, and by the time the new collection is out it is already obsolescent itself. A TF repository is open 24/7, so it can catch the newest fads. On the other hand, the repository system is under revision control, so that earlier editions can be kept on the accessible, or an authorised edition alongside the nightly release. It is also possible to open and close special purpose scheduled terminology projects using the repository system as a base of data. There is no need to throw away anything that works, just because there are more alternatives available.

Workflow

User roles in the TF workflow

In a professional terminology use scenario, participants in the terminology workflow can assume different roles. Here is one scenario.

  1. General users search terms (Query) and comment on them (Comment).
  2. Contributors and moderators check the comments (Comment) and make new term and term modification proposals (Wiki).
  3. Terminology/linguistic professionals review the proposals (Wiki), then make changes to RDF repositories (RDF editor).

Higher quality implies more restricted access rights. These streams of data are in principle platform neutral, so that future users can set up their favored toolkits to consume and also contribute to the professional data. In practice, platforms needs are different, since lower streams are textual, while the top level consists of formal ontologies. Who is allowed to bear which roles is left open: in some scenarios, everyone can be a moderator, in others there can be some hierarchy of authority.

TermFactory roles around MediaWiki

Here we consider the division of labor in a Wiki based TermFactory community and the associated user roles.

Approved TermFactory specific descriptive information about some resource, say 'cat' meant for human consumption can be entered on a wiki page associated to the resource. The content of the page should agree with the TermFactory entry about the this sense of the word 'cat'. If it does not, one or the other needs updating.

Also on the page is the snapshot of a TermFactory entry for this term resource that goes with this version of the page. It is easy to browse to an earlier version of the enty and the accompanying text using MediaWiki page history.

One plausible division of labor is the following.

  • Unregisterd users can browse the open collection.
  • Unregisterd users can discuss terms using Disqus commenting system,
  • Unregistered users can discuss MediaWiki pages trough the MediaWiki Discussion tab.
  • A wiki page in an open collection can be edited by self-registered (i.e. identifiable) users.
  • A wiki page in a closed collection can be edited by other-registered (i.e. approved) users.
  • The TF entry associated to a resource can be edited by approved users.
  • An ontology can be edited with a TF entry by authoritative users (owners of a collection).

Each echelon can be further divided by category to specialists of each category.

TF localization

TF localization

TF classes and properties like concept, term, and expression are also described in TF as (instances of) concepts, terms, and expressions. This means that TF is capable of reflection : it can document and localize itself.

A standard-issue TF term ontology tf-TFS.owl provides translations of TFS descriptive classes and properties. This information is used to change language in the TF front end tools.

Note how the property names in the example below have been partially localized into Finnish (the coverage depends on the localization vocabulary). This is not interface localization, but content localization (though the distinction becomes relative).

Show/hide localized WordNet entry

The localization was done with the following command line:

factor HTML lang=fi lion=../owl/wn/TFwnLion-fi.owl schema=../owl/wn/TFwn.owl ../io/entity.ttl < entity-lion.html

The TF to HTML writer and its converse, the HTML to TF parser are parametrised with a localization model and lang code. Given these parameters, a TF model is serialised in HTML with property and value URIs labeled with strings taken from the localization model and language. Conversely, when an edited HTML document is parsed back into TF, labels used in the HTML document are mapped back to TF URIs by looking up corresponding localization terms from the localization model.

TF JSON content localization files

The localization file to the HTML writer/reader can also be a json format file as produced by a Perl + SPARQL script localize below. The localization file's full URL can be given as a parameter. The json file extension must be .json . A location for localization files can be specified with TF option TF_LION in the conf . The default location is where the ontologies are, home:/owl/ . If a bare relative path of form something.json is explicitly given, it is resolved against the value of TF_LION . If no localization file is given, then the value of TF_LION is used as the URL. If this URL is a directory and localization lang is (say) fi, then filename fi.json is resolved against it.

JSON localization files can be generated from localization ontologies with the help of Perl script io/bin/localize . It uses the localization script template etc/scripts/lion.sparql . The format of TF json localization files is the json result set format defined in SPARQL 1.1 recommendation, produced by the Jena ARQ ResultSetFormatter.outputAsJSON from a SPARQL SELECT query.

Here is an example abbreviated from the result of running localize on the TF Finnish localization vocabulary http://tfs.cc/TF/owl/fi-TFS.owl . The top level keys are "prefixes", "head", and "results". Prefixes are namespace prefixes from the localization vocabulary. Head is the sequence of the SELECT query variables. Results are a set of "bindings", and each binding binds some of the query variables, giving he RDF type of the variable and the the actual value of the variable.

The json localization source file format is multilingual. There is a minimized monolingual json localization file format compiled from the source format that is used in TF front end localization .

{ "prefixes": { "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "owl": "http://www.w3.org/2002/07/owl#", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "meta": "http://tfs.cc/meta/", } , "head": { "vars": [ "inst" , "label" ] } , "results": { "bindings": [ { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/TermFactory" } , "label": { "type": "literal" , "xml:lang": "en" , "value": "TermFactory" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/TermFactory" } , "label": { "type": "literal" , "xml:lang": "fi" , "value": "Termitehdas" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/TermFactory" } , "label": { "type": "literal" , "xml:lang": "zh" , "value": "术语工厂" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/Value" } , "label": { "type": "literal" , "xml:lang": "en" , "value": "anything" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/Value" } , "label": { "type": "literal" , "xml:lang": "fi" , "value": "kaikki" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/Value" } , "label": { "type": "literal" , "xml:lang": "zh" , "value": "什么" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/insert" } , "label": { "type": "literal" , "xml:lang": "en" , "value": "Insert" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/insert" } , "label": { "type": "literal" , "xml:lang": "fi" , "value": "Lisää" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/insert" } , "label": { "type": "literal" , "xml:lang": "zh" , "value": "插入" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/property" } , "label": { "type": "literal" , "xml:lang": "fi" , "value": "ominaisuus" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/property" } , "label": { "type": "literal" , "xml:lang": "en" , "value": "property" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/property" } , "label": { "type": "literal" , "xml:lang": "zh" , "value": "属性" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/value" } , "label": { "type": "literal" , "xml:lang": "fi" , "value": "arvo" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/value" } , "label": { "type": "literal" , "xml:lang": "en" , "value": "value" } } , { "inst": { "type": "uri" , "value": "http://tfs.cc/meta/value" } , "label": { "type": "literal" , "xml:lang": "zh" , "value": "值" } } ] } }

Localization queries

Shown below is a sample SPARQL query that finds the localizations of TermFactory schema concepts in Finnish.

Show/hide sparql example

pellet4tf query -F -d -q sparql/select-tfs-concepts-with-lion-fi.sparql Query: ----------------------------------------------------- # select-tfs-concepts-with-lion-fi.sparql #en table of concepts with their Finnish localizations from given ontology and its localization vocabulary #fi taulukko käsitteistä suomennoksineen annetussa ontologiassa ja sen lokalisaatiossa PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX ont: <http://tfs.cc/ont/> PREFIX term: <http://tfs.cc/term/> PREFIX meta: <http://tfs.cc/meta/> PREFIX exp: <http://tfs.cc/exp/> SELECT DISTINCT ?inst ?exp FROM NAMED <http://tfs.cc/owl/TFS.owl> FROM NAMED <http://tfs.cc/owl/fi-TFS.owl> WHERE { GRAPH <http://tfs.cc/owl/TFS.owl> { ?inst rdf:type ont:Concept . } OPTIONAL { GRAPH <http://tfs.cc/owl/fi-TFS.owl> { ?term term:hasReferent ?inst . ?term term:hasDesignation ?exp . ?exp exp:langCode "fi" . } } } ORDER BY ASC(?inst) ASC(?exp) ----------------------------------------------------- Query_Results ( 74 answer/s limit 100 ) ----------------------------------------------------------------------------------------------------------- | inst | exp | =========================================================================================================== | http://tfs.cc/exp/Adjective | | | http://tfs.cc/exp/Adjective_phrase | | | http://tfs.cc/exp/Adposition | | | http://tfs.cc/exp/Adposition_phrase | | | http://tfs.cc/exp/Adverb | | | http://tfs.cc/exp/Adverb_phrase | | | http://tfs.cc/exp/American_English | | | http://tfs.cc/exp/Appellation | | | http://tfs.cc/exp/Chinese | http://tfs.cc/exp/fi-kiina-N | | http://tfs.cc/exp/Designation | | | http://tfs.cc/exp/English | http://tfs.cc/exp/fi-englantia-N | | http://tfs.cc/exp/Finnish | http://tfs.cc/exp/fi-suomi-N | | http://tfs.cc/exp/Language | | | http://tfs.cc/exp/Noun | http://tfs.cc/exp/fi-substantiivi-N | | http://tfs.cc/exp/Noun_phrase | | | http://tfs.cc/exp/Phrase | | | http://tfs.cc/exp/Preposition | | | http://tfs.cc/exp/Russian | | | http://tfs.cc/exp/Verb | | | http://tfs.cc/exp/Verb_phrase | | | http://tfs.cc/exp/baseForm | http://tfs.cc/exp/fi-perusmuoto-N | | http://tfs.cc/exp/case | | | http://tfs.cc/exp/catCode | http://tfs.cc/exp/fi-sanaluokka-N | | http://tfs.cc/exp/gender | http://tfs.cc/exp/fi-suku-N | | http://tfs.cc/exp/hasHead | http://tfs.cc/exp/fi-edussana-N | | http://tfs.cc/exp/headPosition | http://tfs.cc/exp/fi-edussanan_paikka-N | | http://tfs.cc/exp/langCode | http://tfs.cc/exp/fi-kielitunnus-N | | http://tfs.cc/exp/number | http://tfs.cc/exp/fi-luku-N | | http://tfs.cc/exp/romanisation | http://tfs.cc/exp/fi-latinalaistus-N | | http://tfs.cc/exp/text | http://tfs.cc/exp/fi-teksti-N | | http://tfs.cc/meta/Description | | | http://tfs.cc/meta/affect | | | http://tfs.cc/meta/definition | | | http://tfs.cc/meta/frequency | http://tfs.cc/exp/fi-taajuus-N | | http://tfs.cc/meta/hidden | http://tfs.cc/exp/fi-piilotettu-A | | http://tfs.cc/meta/link | http://tfs.cc/exp/fi-linkki-N | | http://tfs.cc/meta/register | | | http://tfs.cc/meta/see | http://tfs.cc/exp/fi-katso-V | | http://tfs.cc/meta/status | | | http://tfs.cc/meta/usage | | | http://tfs.cc/ont/Chemistry | | | http://tfs.cc/ont/Concept | http://tfs.cc/exp/fi-käsite-N | | http://tfs.cc/ont/Construction_industry | | | http://tfs.cc/ont/Content | | | http://tfs.cc/ont/Country | | | http://tfs.cc/ont/Data | | | http://tfs.cc/ont/Geography | http://tfs.cc/exp/fi-maantiede-N | | http://tfs.cc/ont/Information_and_communication_technology | | | http://tfs.cc/ont/Input | | | http://tfs.cc/ont/Language | | | http://tfs.cc/ont/Language_industry | | | http://tfs.cc/ont/Language_technology | | | http://tfs.cc/ont/Linguistics | | | http://tfs.cc/ont/Multilingual_language_technology | | | http://tfs.cc/ont/Software | | | http://tfs.cc/ont/Terminology | | | http://tfs.cc/ont/ctryCode | http://tfs.cc/exp/fi-maatunnus-N | | http://tfs.cc/term/Abbreviation | | | http://tfs.cc/term/Acronym | | | http://tfs.cc/term/Context | | | http://tfs.cc/term/Definition | | | http://tfs.cc/term/Description | | | http://tfs.cc/term/Example | | | http://tfs.cc/term/Explanation | | | http://tfs.cc/term/LongForm | | | http://tfs.cc/term/ScientificTerm | | | http://tfs.cc/term/ShortForm | | | http://tfs.cc/term/Term | http://tfs.cc/exp/fi-termi-N | | http://tfs.cc/term/Variant | | | http://tfs.cc/term/hasReferent | http://tfs.cc/exp/fi-käsite-N | | http://tfs.cc/term/referentOf | http://tfs.cc/exp/fi-termit-N | | http://tfs.cc/term/seeFalseFriend | http://tfs.cc/exp/fi-petollinen_ystävä-N | | http://tfs.cc/term/status | | | http://tfs.cc/term/termOf | | -----------------------------------------------------------------------------------------------------------

The boilerplate query select-concepts-with-lion-by-lang.sparql parametrizes this query over languages. Such boilerplate queries can do the job of commandline scripts over the web.

http://localhost:8080/TermFactory/query?q=home:/etc/scripts/select-concepts-with-lion-by-lang.sparql&i=fi&r=http%3A%2F%2Ftfs.cc%2Fowl%2Ffi-TFS.owl&e=MIXED&f2=JSON
How localization works

Localization in TF is nothing special, it is just another term lookup using SPARQL. Classes that are used in TF to describe terms and expressions, like ont:Concept , exp:Expression , term:Term , exp:Noun , are in no way different from any others for which TF provides multilingual terms and expressions. Such terms are associated to these classes as properties of their respective representative instances (concepts, i.e. class puns). It is straightforward to write sparql queries that localize these - or any other - terms in the language(s) of interest.

For OWL object properties, like meta:hasSubjectField , term:seeFalseFriend , and data properties like exp:baseForm , a link to localizing vocabulary has to be provided in a different (but analogous) way. For OWL 2, this is not difficult because OWL 2 allows punning classes and properties., so that a class or property can be metamodeled (given first order properties) with a homonymous instance. For backward compatibility with OWL 1, one can use the TF punning convention to pun a property with a paronymous instance that has the same local name as the property but a minimally different, conventionally related namespace prefix. For instance, meta:hasSubjectField is punned with instance meta0:hasSubjectField . The pun is then localized as before. (The punning convention was dropped from TFS starting with version 3.9.)

It is straightforward to write a sparql query that fetches for any given object (class, property, or instance) its localized names in different languages. Here is an example.

Show/hide localization example

pellet4tf query -F1 -q sparql/select-term-keys-by-base-i.sparql -F2 ../owl/tf-TFS.owl Query_Results ( 100 answer/s limit 100 ) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | base | exp | term | concept | ================================================================================================================================================================================================ | "A" | http://tfs.cc/exp/tfs-A-N | http://tfs.cc/term/tfs-A-N_-_exp-Adjective | http://tfs.cc/exp/Adjective | | "American English" | http://tfs.cc/exp/en-American_English-AN | http://tfs.cc/term/en-American_English-AN_-_exp-American_English | http://tfs.cc/exp/American_English | | "CN" | http://tfs.cc/exp/ISO-CN-N | http://tfs.cc/term/ISO-CN-N_-_ont-China | http://tfs.cc/ont/China | | "China" | http://tfs.cc/exp/en-China-N | http://tfs.cc/term/en-China-N_-_ont-China | http://tfs.cc/ont/China | | "Chinese" | http://tfs.cc/exp/en-Chinese-AN | http://tfs.cc/term/en-Chinese-AN_-_exp-Chinese | http://tfs.cc/exp/Chinese | | "English" | http://tfs.cc/exp/en-English-AN | http://tfs.cc/term/en-English-AN_-_exp-English | http://tfs.cc/exp/English | | "FI" | http://tfs.cc/exp/ISO-FI-N | http://tfs.cc/term/ISO-FI-N_-_ont-Finland | http://tfs.cc/ont/Finland | | "Finland" | http://tfs.cc/exp/en-Finland-N | http://tfs.cc/term/en-Finland-N_-_ont-Finland | http://tfs.cc/ont/Finland | | "Finnish" | http://tfs.cc/exp/en-Finnish-AN | http://tfs.cc/term/en-Finnish-AN_-_exp-Finnish | http://tfs.cc/exp/Finnish | | "Insert" | http://tfs.cc/exp/en-Insert-V | http://tfs.cc/term/en-Insert-V_-_meta-insert | http://tfs.cc/meta/insert | | "Lisää" | http://tfs.cc/exp/fi-Lisää-V | http://tfs.cc/term/fi-Lisää-V_-_meta-insert | http://tfs.cc/meta/insert | | "N" | http://tfs.cc/exp/tfs-N-N | http://tfs.cc/term/tfs-N-N_-_exp-Noun | http://tfs.cc/exp/Noun | | "P" | http://tfs.cc/exp/tfs-P-N | http://tfs.cc/term/tfs-P-N_-_exp-Adposition | http://tfs.cc/exp/Adposition | | "Suomi" | http://tfs.cc/exp/fi-Suomi-N | http://tfs.cc/term/fi-Suomi-N_-_ont-Finland | http://tfs.cc/ont/Finland | | "TermFactory" | http://tfs.cc/exp/en-TermFactory-N | http://tfs.cc/term/en-TermFactory-N_-_meta-TermFactory | http://tfs.cc/meta/TermFactory | | "Termitehdas" | http://tfs.cc/exp/fi-Termitehdas-N | http://tfs.cc/term/fi-Termitehdas-N_-_meta-TermFactory | http://tfs.cc/meta/TermFactory | | "V" | http://tfs.cc/exp/tfs-V-N | http://tfs.cc/term/tfs-V-N_-_exp-Verb | http://tfs.cc/exp/Verb | | "adjective" | http://tfs.cc/exp/en-adjective-N | http://tfs.cc/term/en-adjective-N_-_exp-Adjective | http://tfs.cc/exp/Adjective | | "adposition" | http://tfs.cc/exp/en-adposition-N | http://tfs.cc/term/en-adposition-N_-_exp-Adposition | http://tfs.cc/exp/Adposition | | "affective value" | http://tfs.cc/exp/en-affective_value-N | http://tfs.cc/term/en-affective_value-N_-_term-affect | http://tfs.cc/term/affect | | "aihealue" | http://tfs.cc/exp/fi-aihealue-N | | | | "aihealueet" | http://tfs.cc/exp/fi-aihealueet-N | http://tfs.cc/term/fi-aihealueet-N_-_meta-hasSubjectField | http://tfs.cc/meta/hasSubjectField | | "aika" | http://tfs.cc/exp/fi-aika-N | http://tfs.cc/term/fi-aika-N_-_sem-hasTime | http://tfs.cc/sem/hasTime | | "alias" | http://tfs.cc/exp/fi-alias-N | http://tfs.cc/term/fi-alias-N_-_owl-sameAs | http://www.w3.org/2002/07/owl0#sameAs | | "any" | http://tfs.cc/exp/en-any-D | http://tfs.cc/term/en-any-D_-_meta-Value | http://tfs.cc/meta/Value | | "arvo" | http://tfs.cc/exp/fi-arvo-N | http://tfs.cc/term/fi-arvo-N_-_meta-value | http://tfs.cc/meta/value | | "base form" | http://tfs.cc/exp/en-base_form-N | | | | "basic form" | http://tfs.cc/exp/en-basic_form-N | http://tfs.cc/term/en-basic_form-N_-_exp-baseForm | http://tfs.cc/exp/baseForm | | "classes" | http://tfs.cc/exp/en-classes-N | http://tfs.cc/term/en-classes-N_-_rdf-type | http://www.w3.org/1999/02/22-rdf-syntax-ns0#type | | "comments" | http://tfs.cc/exp/en-comments-N | http://tfs.cc/term/en-comments-N_-_rdfs-comment | http://www.w3.org/2000/01/rdf-schema#comment | | "concept" | http://tfs.cc/exp/en-concept-N | http://tfs.cc/term/en-concept-N_-_ont-Concept | http://tfs.cc/ont/Concept | | "country code" | http://tfs.cc/exp/en-country_code-N | http://tfs.cc/term/en-country_code-N_-_ont-ctryCode | http://tfs.cc/ont/ctryCode | | "definiends" | http://tfs.cc/exp/en-definiends-N | http://tfs.cc/term/en-definiends-N_-_term-definitionOf | http://tfs.cc/term/definitionOf | | "definitions" | http://tfs.cc/exp/en-definitions-N | http://tfs.cc/term/en-definitions-N_-_term-hasDefinition | http://tfs.cc/term/hasDefinition | | "designation" | http://tfs.cc/exp/en-designation-N | http://tfs.cc/term/en-designation-N_-_term-hasDesignation | http://tfs.cc/term/hasDesignation | | "designation of" | http://tfs.cc/exp/en-designation_of-N | http://tfs.cc/term/en-designation_of-N_-_term-designationOf | http://tfs.cc/term/designationOf | | "domains" | http://tfs.cc/exp/en-domains-N | http://tfs.cc/term/en-domains-N_-_meta-hasSubjectField | http://tfs.cc/meta/hasSubjectField | | "edussana" | http://tfs.cc/exp/fi-edussana-N | http://tfs.cc/term/fi-edussana-N_-_exp-hasHead | http://tfs.cc/exp/hasHead | | "edussanan paikka" | http://tfs.cc/exp/fi-edussanan_paikka-N | http://tfs.cc/term/fi-edussanan_paikka-N_-_exp-headPosition | http://tfs.cc/exp/headPosition | | "en" | http://tfs.cc/exp/ISO-en-N | http://tfs.cc/term/ISO-en-N_-_exp-English | http://tfs.cc/exp/English | | "englanti" | http://tfs.cc/exp/fi-englanti-N | | | | "englantia" | http://tfs.cc/exp/fi-englantia-N | http://tfs.cc/term/fi-englantia-N_-_exp-English | http://tfs.cc/exp/English | | "eri" | http://tfs.cc/exp/fi-eri-A | | | | "expression" | http://tfs.cc/exp/en-expression-N | http://tfs.cc/term/en-expression-N_-_exp-Expression | http://tfs.cc/exp/Expression | | "false friend" | http://tfs.cc/exp/en-false_friend-N | http://tfs.cc/term/en-false_friend-N_-_meta-seeFalseFriend | http://tfs.cc/term/seeFalseFriend | | "fi" | http://tfs.cc/exp/ISO-fi-N | http://tfs.cc/term/ISO-fi-N_-_exp-Finnish | http://tfs.cc/exp/Finnish | | "frequency" | http://tfs.cc/exp/en-frequency-N | http://tfs.cc/term/en-frequency-N_-_meta-frequency | http://tfs.cc/meta/frequency | | "function" | http://tfs.cc/exp/en-function-N | http://tfs.cc/term/en-function-N_-_sem-hasPurpose | http://tfs.cc/sem/hasFunction | | "gender" | http://tfs.cc/exp/en-gender-N | http://tfs.cc/term/en-gender-N_-_exp-gender | http://tfs.cc/exp/gender | | "head word" | http://tfs.cc/exp/en-head_word-N | http://tfs.cc/term/en-head_word-N_-_exp-hasHead | http://tfs.cc/exp/hasHead | | "hidden" | http://tfs.cc/exp/en-hidden-A | http://tfs.cc/term/en-hidden-A_-_meta-hidden | http://tfs.cc/meta/hidden | | "huomautukset" | http://tfs.cc/exp/fi-huomautukset-N | http://tfs.cc/term/fi-huomautukset-N_-_rdfs-comment | http://www.w3.org/2000/01/rdf-schema#comment | | "huomautus" | http://tfs.cc/exp/fi-huomautus-N | http://tfs.cc/term/fi-huomautus-N_-_rdfs-comment | http://www.w3.org/2000/01/rdf-schema#comment | | "instrument" | http://tfs.cc/exp/en-instrument-N | http://tfs.cc/term/en-instrument-N_-_sem-hasInstrument | http://tfs.cc/sem/hasInstrument | | "kaikki" | http://tfs.cc/exp/fi-kaikki-D | http://tfs.cc/term/fi-kaikki-D_-_meta-Value | http://tfs.cc/meta/Value | | "katso" | http://tfs.cc/exp/fi-katso-V | http://tfs.cc/term/fi-katso-V_-_meta-see | http://tfs.cc/meta/see | | "keino" | http://tfs.cc/exp/fi-keino-N | http://tfs.cc/term/fi-keino-N_-_sem-hasMeans | http://tfs.cc/sem/hasMeans | | "kieliopillinen" | http://tfs.cc/exp/fi-kieliopillinen-A | http://tfs.cc/term/fi-kieliopillinen-A_-_meta-expressionProperty | http://tfs.cc/meta/expressionProperty | | "kielioppipiirre" | http://tfs.cc/exp/fi-kielioppipiirre-N | http://tfs.cc/term/fi-kielioppipiirre-N_-_meta-expressionDataProperty | http://tfs.cc/meta/expressionDataProperty | | "kielioppisuhde" | http://tfs.cc/exp/fi-kielioppisuhde-N | http://tfs.cc/term/fi-kielioppisuhde-N_-_meta-expressionObjectProperty | http://tfs.cc/meta/expressionObjectProperty | | "kielitunnus" | http://tfs.cc/exp/fi-kielitunnus-N | http://tfs.cc/term/fi-kielitunnus-N_-_exp-langCode | http://tfs.cc/exp/langCode | | "kiina" | http://tfs.cc/exp/fi-kiina-N | http://tfs.cc/term/fi-kiina-N_-_exp-Chinese | http://tfs.cc/exp/Chinese | | "kohde" | http://tfs.cc/exp/fi-kohde-N | http://tfs.cc/term/fi-kohde-N_-_sem-hasGoal | http://tfs.cc/sem/hasGoal | | "käsite" | http://tfs.cc/exp/fi-käsite-N | http://tfs.cc/term/fi-käsite-N_-_ont-Concept | http://tfs.cc/ont/Concept | | "käsite" | http://tfs.cc/exp/fi-käsite-N | http://tfs.cc/term/fi-käsite-N_-_term-hasReferent | http://tfs.cc/term/hasReferent | | "käsitepiirre" | http://tfs.cc/exp/fi-käsitepiirre-N | http://tfs.cc/term/fi-käsitepiirre-N_-_meta-conceptDataProperty | http://tfs.cc/meta/conceptDataProperty | | "käsitesuhde" | http://tfs.cc/exp/fi-käsitesuhde-N | http://tfs.cc/term/fi-käsitesuhde-N_-_meta-conceptObjectProperty | http://tfs.cc/meta/conceptObjectProperty | | "käsitteellinen" | http://tfs.cc/exp/fi-käsitteellinen-A | http://tfs.cc/term/fi-käsitteellinen-A_-_meta-conceptProperty | http://tfs.cc/meta/conceptProperty | | "käyttö" | http://tfs.cc/exp/fi-käyttö-N | http://tfs.cc/term/fi-käyttö-N_-_term-usage | http://tfs.cc/term/usage | | "label" | http://tfs.cc/exp/en-label-N | http://tfs.cc/term/en-label-N_-_rdfs-label | http://www.w3.org/2000/01/rdf-schema0#label | | "language code" | http://tfs.cc/exp/en-language_code-N | http://tfs.cc/term/en-language_code-N_-_exp-langCode | http://tfs.cc/exp/langCode | | "latinalaistus" | http://tfs.cc/exp/fi-latinalaistus-N | http://tfs.cc/term/fi-latinalaistus-N_-_exp-romanisation | http://tfs.cc/exp/romanisation | | "linkki" | http://tfs.cc/exp/fi-linkki-N | http://tfs.cc/term/fi-linkki-N_-_meta-link | http://tfs.cc/meta0/link | | "luku" | http://tfs.cc/exp/fi-luku-N | http://tfs.cc/term/fi-luku-N_-_exp-number | http://tfs.cc/exp/number | | "luokat" | http://tfs.cc/exp/fi-luokat-N | http://tfs.cc/term/fi-luokat-N_-_rdf-type | http://www.w3.org/1999/02/22-rdf-syntax-ns0#type | | "luokka" | http://tfs.cc/exp/fi-luokka-N | | | | "lähde" | http://tfs.cc/exp/fi-lähde-N | http://tfs.cc/term/fi-lähde-N_-_meta-hasSource | http://tfs.cc/meta0/hasSource | | "maantiede" | http://tfs.cc/exp/fi-maantiede-N | http://tfs.cc/term/fi-maantiede-N_-_ont-Geography | http://tfs.cc/ont0/Geography | | "maatunnus" | http://tfs.cc/exp/fi-maatunnus-N | http://tfs.cc/term/fi-maatunnus-N_-_ont-ctryCode | http://tfs.cc/ont0/ctryCode | | "means" | http://tfs.cc/exp/en-means-N | http://tfs.cc/term/en-means-N_-_sem-hasMeans | http://tfs.cc/sem/hasMeans | | "merkitykset" | http://tfs.cc/exp/fi-merkitykset-N | http://tfs.cc/term/fi-merkitykset-N_-_sign-hasMeaning | http://tfs.cc/sign/hasMeaning | | "merkitys" | http://tfs.cc/exp/fi-merkitys-N | | | | "määritelmät" | http://tfs.cc/exp/fi-määritelmät-N | http://tfs.cc/term/fi-määritelmät-N_-_term-hasDefinition | http://tfs.cc/term/hasDefinition | | "määriteltävät" | http://tfs.cc/exp/fi-määriteltävät-N | http://tfs.cc/term/fi-määriteltävät-N_-_term-definitionOf | http://tfs.cc/term/definitionOf | | "nimeke" | http://tfs.cc/exp/fi-nimeke-N | http://tfs.cc/term/fi-nimeke-N_-_rdfs-label | http://www.w3.org/2000/01/rdf-schema0#label | | "nimitettävät" | http://tfs.cc/exp/fi-nimitettävät-N | http://tfs.cc/term/fi-nimitettävät-N_-_term-designationOf | http://tfs.cc/term/designationOf | | "nimitys" | http://tfs.cc/exp/fi-nimitys-N | http://tfs.cc/term/fi-nimitys-N_-_exp-Expression | http://tfs.cc/exp/Expression | | "nimitys" | http://tfs.cc/exp/fi-nimitys-N | http://tfs.cc/term/fi-nimitys-N_-_term-hasDesignation | http://tfs.cc/term/hasDesignation | | "noun" | http://tfs.cc/exp/en-noun-N | http://tfs.cc/term/en-noun-N_-_exp-Noun | http://tfs.cc/exp/Noun | | "number" | http://tfs.cc/exp/en-number-N | http://tfs.cc/term/en-number-N_-_exp-number | http://tfs.cc/exp/number | | "ominaisuus" | http://tfs.cc/exp/fi-ominaisuus-N | http://tfs.cc/term/fi-ominaisuus-N_-_meta-property | http://tfs.cc/meta/property | | "osa" | http://tfs.cc/exp/fi-osa-N | http://tfs.cc/term/fi-osa-N_-_sem-hasPart | http://tfs.cc/sem/hasPart | | "paikka" | http://tfs.cc/exp/fi-paikka-N | http://tfs.cc/term/fi-paikka-N_-_sem-hasPlace | http://tfs.cc/sem/hasPlace | | "part" | http://tfs.cc/exp/en-part-N | http://tfs.cc/term/en-part-N_-_sem-hasPart | http://tfs.cc/sem/hasPart | | "part of speech" | http://tfs.cc/exp/en-part_of_speech-N | http://tfs.cc/term/en-part_of_speech-N_-_exp-catCode | http://tfs.cc/exp/catCode | | "perusmuoto" | http://tfs.cc/exp/fi-perusmuoto-N | http://tfs.cc/term/fi-perusmuoto-N_-_exp-baseForm | http://tfs.cc/exp/baseForm | | "petollinen ystävä" | http://tfs.cc/exp/fi-petollinen_ystävä-N | http://tfs.cc/term/fi-petollinen_ystävä-N_-_meta-seeFalseFriend | http://tfs.cc/term/seeFalseFriend | | "piilotettu" | http://tfs.cc/exp/fi-piilotettu-A | http://tfs.cc/term/fi-piilotettu-A_-_meta-hidden | http://tfs.cc/meta/hidden | | "piirre" | http://tfs.cc/exp/fi-piirre-N | http://tfs.cc/term/fi-piirre-N_-_meta-dataProperty | http://tfs.cc/meta/dataProperty | | "place" | http://tfs.cc/exp/en-place-N | http://tfs.cc/term/en-place-N_-_sem-hasPlace | http://tfs.cc/sem/hasPlace | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Literal datatype values can not be localized as such, but they can be made self-documenting by reflection. TF creates an assocation between the picklist property exp:langCode , its string value the language code en , and the class exp:English that lets TF self-document language codes as a regular TF concept/term/expression association. The first step is to construe the string value en of the property exp:langCode as the base form of an expression exp1:ISO-en-N designating term term:ISO-631-1-en . This term is classed as an abbreviation (member of term:Abbreviation ) whose referent is the concept exp:English , the English language. (The language code of a term is usually that of its expression, but not necessarily always. English terms may be built on expressions borrowed from foreign languages. They may not mean the same in English as they do in the loan language.) Here is the entry for the language code "en" in TF. It expresses that "en" is the base form of the ISO two-letter language code that designates English.

term:ISO-631-1-en term:hasDesignation exp1:ISO-en-N . exp1:ISO-en-N exp:baseForm "en" . term:ISO-631-1-en term:hasReferent exp:English .

All further properties of exp:English can be accessed from the language code through this chain of associations, including designations and descriptions in different languages. An example documentation query is shown below.

pellet4tf query -F1 -q vloc.sparql -F2 ../owl/tf-TFS.owl --------------------------------------------------- Query Results (18 answers): prop | code | obj | lang | base =================================================== catCode | "A" | Adjective | "en" | "adjective" catCode | "P" | Adposition | "en" | "adposition" catCode | "N" | Noun | "en" | "noun" catCode | "V" | Verb | "en" | "verb" ctryCode | "CN" | China | "en" | "China" ctryCode | "FI" | Finland | "en" | "Finland" ctryCode | "FI" | Finland | "fi" | "Suomi" langCode | "en" | English | "en" | "English" langCode | "en" | English | "fi" | "englanti" langCode | "en" | English | "zh" | "英语" langCode | "zh" | Chinese | "en" | "Chinese" langCode | "zh" | Chinese | "fi" | "kiina" langCode | "zh" | Chinese | "zh" | "中文" langCode | "fi" | Finnish | "en" | "Finnish" langCode | "fi" | Finnish | "fi" | "suomi" langCode | "fi" | Finnish | "zh" | "芬兰语" langCode | "iso" | | | langCode | "tfs" | | |

But keep in mind that literal values are not localized. They are what the name says, literal. If a value needs localizing, better make the property an object property and its values TF resources chosen from a short list. Here is The TF localization vocabulary is kept in a separate ontology document tf-TFS.owl which imports TFS.owl . The localization document also contains property type information used by the TF front end. The following query reports types of selected concept, term, and expression properties.

pellet4tf query -F1 -q ptype.sparql -F2 ../owl/TFProp.owl Query Results (17 answers): prop | type | type2 ============================================================ ctryCode | ConceptDataProperty | PicklistProperty referentOf | ConceptObjectProperty | baseForm | ExpressionDataProperty | catCode | ExpressionDataProperty | PicklistProperty gender | ExpressionDataProperty | PicklistProperty headPosition | ExpressionDataProperty | langCode | ExpressionDataProperty | PicklistProperty number | ExpressionDataProperty | PicklistProperty romanisation | ExpressionDataProperty | text | ExpressionDataProperty | hasHead | ExpressionObjectProperty | register | TermDataProperty | PicklistProperty status | TermDataProperty | PicklistProperty usage | TermDataProperty | PicklistProperty valuation | TermDataProperty | PicklistProperty hasReferent | TermObjectProperty | seeFalseFriend | TermObjectProperty |

Classes and their subclasses can be reported in tree form using pellet4tf classify or in tabular form using sparql query child.sparql . Class names can be localized using query cloc.sparql .

How to add localizations

There are many ways to find and edit concepts which have not got localizations in a given language. The following scripts are provided in home:/etc/scripts as a starter.

select-concepts-with-lion-fi.sparql select concepts with their Finnish localizations
select-concepts-missing-lion-fi.sparql select concepts without localizations in Finnish
describe-concepts-missing-lion-fi.sparql describe concepts missing localization in Finnish
construct-concepts-missing-lion-fi.sparql construct templates for filling out missing localizations in Finnish

describe-concepts-missing-lion-fi.sparql runs a TF DESCRIBE query on concepts which have no Finnish localization. It shows among other things their localizations in other languages. It does not create templates for the missing equivalents. In this case, localizations can be created individually in the editor or an insert can be made available through the editor TF Insert menu's inserts. construct-unlocalized-fi.sparql creates templates for missing Finnish localizations.

To avoid duplication of work, one should first check if there are candidates for localization already elsewhere in TF before submitting new proposals.

Sample web localization workflow

Thinking of the division of work between domain experts and language experts, one way of setting up a collaborative localization workflow is to let native domain experts choose vernacular labels for concepts ont he basis of their subject understanding. This creates a TF Label ontology; in other words, we get a run-of-the-mill w3c ontology with language tagged rdfs:labels for different languages. In order to upgrade to TF Sign, we run another query to check for existing TF designations for the labels in available expression repositories (related domains, or general language). Then let native language experts collect and fill out missing linguistic information about the designations, and terminology experts to fill out information about the terms connecting the concepts to the expressions.

The work can be expected to flow somewhat like this:

  1. Find resources that miss localizations.
  2. Collect available TF localization candidates and other possibly helpful resources.
  3. Decide which gaps can be filled and which should be submitted for discussion in TF community.
  4. Generate an editing template for the localizations to be added.
  5. Fill out the template, using insert menu, copy/paste and/or query imports.
  6. Check the edits
  7. Save the edits
  8. Commit the edits to a repository
  9. Document the work

The workflow steps are best carried out in a collaborative environment interspersed with human communication and non-TF web queries, engaging the help of the TF community. In this section, we concentrate on the concrete manual and semi-automatic aspects of the workflow, to give a feel of what the TF toolkit can do to help.

Find the gaps

First, find out about the status of the localization. The following query shows which TermFactory concepts have Finnish localizations in the TF term ontology tf-TFS.owl and which not. This is a boilerplate query which takes the name of the term ontology as input.

Show/hide localization status queries

gaps query

The result of the query is shown below.

The next query makes a listing of the concepts missing localization.

misslist query

The resulting listing is shown here.

Collect helpful resources for filling the gaps

The user manual describes setting up resource catalogs. . Here is an example query to look up designations matching string term from a catalog of designations.

Show/hide hitlist queries

hitlist query

The result of the query is shown below.

Using canned queries and query aliases, a complete workflow of TF queries can be condensed into one TF address. The query alias below looks up designations matching a pattern and their respective home ontologies from an index database and runs a describe query on the hit list. (For details, refer to section on resource catalogs in the admin manual.)

idx-exp query

The result of the query is shown below.

Generate templates for missing entries

The following samples illustrate generating entry templates for missing localizations.

Show/hide entry template construction

gapped query

Here is the result of the template query.

gapped answer
Fill the gaps

To open the template in the editor, push the Pack button to convert the template construction query into a TF query url and then send the query url to the editor with the Edit button. Alternatively, the query results can be saved to a file and the file address sent to the editor. To save query results to a file, enter a writable URL in the download address. (The query form screenshot of the previous section displays the query url packed from the template construction query.)

The editor does not load the forwarded query automatically (in case some settings, like the active ontology, need changing first). To load the query, open the editor settings and load the query or file to the edit area with the Query button. These steps are shown in the next set of snapshots.

Show/hide loading the entry

editor settings

The editor settings after opening the editor.

editor loaded

The editor text area after loading the entry templates in it.

Check the edits

When the entry has been completed to the user's satisfaction, it is time to validate the edits. This is done with the Check button. All it does is send the HTML edits from the textarea to TF, which reads the edits into RDF triples and writes the triples back to HTML. If something does not look right, hit the Back button to get back to the edits, fix the errors and try again.

The Check button also runs the blank node factor option selected next to it. The default action is to remove blanks (replace anonymous RDF nodes with temporary URIs). The new term and designation shown in the template are anonymous (blank) by default, as shown by their names that start with an underscore and colon _: . If blanks option relabel is chosen, new descriptive names are generated for such anonymous TF resources, taking values for the key fields from the template (language code, base form and category code for designations, designation and referent for terms).

Show/hide checking the edits

The next screenshot shows the editor settings for generating descriptive names.

check with relabeling

Descriptive IRIs generated from the dummy key field contents in the template are shown below. (The IRIs are normally hidden but come to view when the cursor moves inside the entry.)

check after relabeling

Suppose we encounter a term or designation whose key fields are wrong or obsolete (like the example above). If the resource has been made public and is shared by other ontologies, we have a problem. It is not enough to just create a new entry to replace the old one in our repositories. That would leave outside clients already using the old entry holding the bag.

A suboptimal solution is just to correct the key fields of the obsolete resource, without touching the label. But then label is no longer descriptive of the contents. Discrepancies between descriptive name and key properties undermine the usefulness of the whole notion. A better solution is to create a new entry with its own descriptive name and connect the two entries with an owl:sameAs link. Both old and new resources are faithful to their contents, and the link connects the old entry to the new one. Conflicts between them are captured with an OWL reasoner.

Named resources get edited back to blanks by replacing them with blank labels beginning with _: . The label _:0 is an anonymous variable that does not create bindings.

deprecation link
Save the edits

Edits are saved with the Save button. The save is to the download address if it is specified, and to the source address by default. The save address must be a writable by the logged in user. You may then use the Download button to download the save to the broser and save it from the browser to the local filesystem. For easy local up/downloads, one approach is to mount a TF WebDAV directory as a web folder on the local machine. Editor contents can also be downloaded locally using CKEditor Save as file or Preview buttons.

Show/hide save

editor save
Diff original and edits

When updating a big ontology, it may not be feasible to take the complete ontology into the editor at once. It is better to query the ontology and perhaps some other helpful ontologies to get a good working set of data to the editor. The contents shown in the editor may then come from several ontologies, among them triples from the ontology one wants to update, called the active ontology. For instance, if one wants to edit terms for some concept, it is useful to include the concept and its properties, although one does not mean or perhaps is not even authorized to edit the concept ontology. The ontology of terms is then the active ontology. The editor supports this by showing editable active ontology content boldfaced and background readonly content greyed (transparent) italics. To mark provenance, the editor needs to be told the address of the active ontology in the Settings before loading. Blank triples are marked for provenance individually. Therefore markings on blank triples are sound but not complete: a blank triple marked as deleted, added, active or inactive is one, but some triples may remain unmarked.

After an editing session, it is time to update the active ontology with the edits. The update of the active ontology consists of removal of those triples from the active ontology that the user has deleted from the original, and adding those triples to it that the user has edited in. In order to do these updates, the editor needs to calculate the symmetric difference of the original and the edits. Triples in the original missing from the edits are to be deleted, and triples in the edits not in the original are to be added. The edit form button Diff shows the differences between the edits and the original graphically.

Show/hide diff

editor diff

The Diff button also runs the blank node factor option selected from the pull-down, The default action is to remove blanks (replace anonymous RDF nodes in the edits with temporary URIs). The new term and designation shown in the template are anonymous (blank) by default, as shown by their names that start with an underscore and colon _: . If blanks option relabel is chosen, new descriptive names are generated for such anonymous TF resources, taking values for the key fields from the template (language code, base form and category code for designations, designation and referent for terms).

If one wants to save a snapshot of an updating session in order to continue editing later, the simplest procedure is to save the diff into a file. The diff contains both the original and the edits. The editor is able to load the diff back to the editor so that editing can continue where it was left off. Alternatively, one may save the original and the edits to separate files, load the original into the editor and cut-paste the edits into the editarea.

Commit the edits

Edits are committed to the active ontology repository with the Commit button. The active address in the Settings must point to an ontology repository writable by the user and the overwrite checkbox must be checked. That is all there is to it. But it is good to be careful with commits.

The following workflow sequence updates a repository with a previously saved entry. It shows how to use the editor to check the update before commit and double check the result after the commit.

Show/hide committing the edits

Suppose a new Finnish entry for adjective has been saved in a document Laatusana.html in a previous session and is now up for commit into repository tdb+fi-TFS.owl . We first request an entry for for the concept exp:Adjective in the repository to see what it has already got by way of adjective designations.

adjective entry before commit

Apparently nothing there yet. Next we load the new entry into the editor. We keep the empty result of the previous query in the editor, by checking same original , so that we can run a diff between the edits and the original.

load edits from laatusana.html

We run the diff with the repository as active, so we can see what is already in the repository and what is new. The diff shows, as expected, that the Finnish entry is entirely new to the repository (all triples are underlined).

adjective diff

It seems safe to commit the entry. After the commit, all the triples in the entry are bolded, showing that they are now in the repository.

adjective commit

When we request the Finnish entry for adjective from the repository again, we get the entry just committed to it.

adjective entry after commit

Harvesting equivalents from the Web of Data

In this section, we describe a workflow for harvesting multilingual term equivalents from the Web of Data and mapping them to TF entries. We also describe the process of providing entries with syntactic information and converting them into Grammatical Framework lexicons. The steps are

  1. Find language labels from the Web of Data
  2. Map language labels to TF entries.
  3. Edit the entries for GF in TF HTML format
  4. Write entries as a GF lexicon.
dbpedia workflow

Find localization labels from endpoints

SPARQL endpoints such as DBPedia or FactForge contain multilingual equivalents in the form of rdfs (or skos) labels from Wikipedia in particular. (As of 4/2013, DBPedia reports having 25469901 rdfs:labels.) To constrain the search, select a subject field or covering class for the entries to look for. Exactly what domains and categories to use requires acquaintance with the ontology schemas used by the repository. DBPedia uses Wikipedia Category labels to classify resources by subject field. One can use Wikipedia to select relevant categories or consult DBPedia ontology .

Say we need entries for names of different edible fish in different languages. This is how it might go:

  1. use a TF index repository or a web search engine to find the name of the Wikipedia category of Edible fish.
  2. send a DESCRIBE query for the category to a sparql endpoint find the relation linking the fish to the category in DBPedia.
  3. get the edible fish and their labels from the endpoint with a canned query or query alias
  4. run another canned query to convert the labels to TF entries.
  5. edit the entries with the TF editor.

TermFactory alias subjectlabels names a sparql script that constructs rdfs:labels for items classified by a given property-value pair. Below is a command line to obtain names of edible fish from dbpedia live and a sample of the results:

[] tf:mapping [ tf:name "subjectlabels" ; rdfs:comment "labels for subjects by property and object"@en, "nimikkeet subjekteille ominaisuuden ja objektin mukaan"@fi ; tf:altName "home:/etc/scripts/construct-labels-by-property-and-object.sparql" ] .

The query is here:

# construct-labels-by-property-and-object.sparql # settings i=dcterms:subject+dbp:Category:Edible_fish&s=dbplive #en labels for subjects having given property and object (prop obj) #fi nimikkeet varannoille joilla on annettu ominaisuus ja kohde (prop obj) PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> CONSTRUCT { ?item rdfs:label ?label } WHERE { ?item $INPUT1 $INPUT2 . ?item rdfs:label ?label . }

The query with the sample settings in the header from command line:

pellet4tf query -q subjectlabels -i "dcterms:subject dbp:Category:Edible_fish" -s dbplive

produces a multilingual list of fish names.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dbpedia: <http://dbpedia.org/resource/> . dbpedia:Patagonian_toothfish rdfs:label "Patagonisk tannfisk"@no , "Légine australe"@fr , "Ledovka patagonská"@cs , "マジェランアイナメ"@ja , "비막치어"@ko , "Antarctische diepzeeheek"@nl , "Schwarzer Seehecht"@de , "Nototènia negra"@ca , "Antar patagoński"@pl , "Merluza-negra"@pt , "Patagonian toothfish"@en , "Dissostichus eleginoides"@it , "Dissostichus eleginoides"@es .

The result page returns to the browser. It is downloaded to the user's dav folder by name Edible_fish.ttl when dav+Edible_fish.ttl is entered in the Download box. If the endpoint is taking too much time, the query can be run in the background by checking checkbox Job in the Settings overlay.

Enrich localization labels to TF entries

The next two query aliases blankterms- and namedterms- nickname canned queries to construct language tagged labels into primitive TF entries. The part of speech is guessed to be noun (the most common case with terms). The queries construct TF Sign entries consisting of the full triad of term, designation, and referent. Rule blankterms- generates blank terms and expressions, postponing creation of descriptive names until the key fields (language, baseform, part of speech) have been validated. Rule namedterms- directly generates descriptive names for the terms and designations, Again, snippets of the query results are shown after the alias.

[] tf:mapping [ tf:prefix "blankterms-" ; tf:altPrefix "http://localhost/TermFactory/query?q=file%3Aio%2Fsparql%2Fconstruct-blank-concept-entries-for-labels.sparql&r=home:/etc/sparql/prefix.ttl&r=" ] .
dbpedia:Patagonian_toothfish term:referentOf [ a term:Term ; term:hasDesignation [ a exp:Designation ; exp:baseForm "ledovka patagonská" ; exp:catCode "N" ; exp:langCode "cs" ] ] ; ...
[] tf:mapping [ tf:prefix "namedterms-" ; tf:altPrefix "http://localhost/TermFactory/query?q=file%3Aio%2Fsparql%2Fconstruct-named-concept-entries-for-labels.sparql&r=home:/etc/sparql/prefix.ttl&r=" ] .
term:cs-ledovka_patagonská-N_-_dbp-Patagonian_toothfish a term:Term ; term:hasDesignation exp1:cs-ledovka_patagonská-N ; term:hasReferent dbpedia:Patagonian_toothfish . exp1:cs-ledovka_patagonská-N a exp:Designation ; exp:baseForm "ledovka patagonská" ; exp:catCode "N" ; exp:langCode "cs" .

The aliases can be composed to run the the steps in one go with query alias gframe-blankterms-dbpcat-Edible_fish .

Enrich the harvested entries

The generated raw entries may be edited in the TF editor using grid layout. For checking the key fields of the generated terms, a concept-oriented grid view may be appropriate. A command line to generate this layout is shown below.

factor -F --tree --skin=tf2html2 --template=gf --format=HTML edibleblank.ttl > edibleblank.html

The layout template nicknamed gfc, shown below, imports the concept oriented layout ont0 and adds details about the grid layout.

@prefix meta: <http://tfs.cc/meta/> . @prefix term: <http://tfs.cc/term/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix ont: <http://tfs.cc/ont/> . [] owl:imports <home:/etc/templates/ont0.ttl> . # set template root meta:Entry meta:hasRoot ont:Concept ; meta:tree "true" ; # hide values of these properties from grid rdf:type meta:css "hide" .

Show/hide blank entry grid

Besides editing the given fields of the grid, new term lines can be added for missing languages by copy-pasting an existing table cell. A complete entry is copy-pasted with Control-Shift-N.

Simple term equivalents (monadic predicates) such as the names here can be directly written into a GF lexicon using the TermFactory GF format writer. To handle relational concepts intelligently, Grammatical Framework prefers entries with morphological and syntactic valency (government) information. The following alias names a query which adds to a TF entry default grammatical features. There are two properties, a TF native feature syn:frame and a GF native feature gf:lin . A snippet of the result is again shown after the alias. Expert users can code the GF frame formula directly in the lin feature. For non-experts, the syn:frame feature takes as values traditional style frame descriptions of style 'V something to something' from which the GF lin feature value can be generated by rule or just listed in the gf mapping file gf-mapping.n3 .

[] tf:mapping [ tf:prefix "gframe-" ; tf:altPrefix "http://localhost/TermFactory/query?q=file%3Aio%2Fsparql%2Fconstruct-frame-properties.sparql&m=1&r=" ] . term:cs-ledovka_patagonská-N_-_dbp0-Patagonian_toothfish a term:Term ; syn:frame "N" ; gf:lin "mkN str" . term:hasDesignation exp1:cs-ledovka_patagonská-N ; exp1:cs-ledovka_patagonská-N a exp:Designation ; exp:baseForm "ledovka patagonská" ; exp:catCode "N" ; exp:gender "n" ; exp:langCode "cs" ; exp:number "sg" .

Edit entries for GF

The grammatical and syntactic features may be edited using a term oriented grid layout gft, whose template is shown below. This template impors the term oriented layout and details which properties show (on hover) the name of the property ("show"), and which properties are hidden ("hide"). These settings only concern css layout, not the underlying html.

@prefix meta: <http://tfs.cc/meta/> . @prefix term: <http://tfs.cc/term/> . @prefix exp: <http://tfs.cc/exp/> . @prefix ont: <http://tfs.cc/ont/> . @prefix syn: <http://tfs.cc/syn/> . @prefix gf: <http://www.grammaticalframework.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . [] owl:imports <home:/etc/templates/term0.ttl> . # set root class meta:Entry meta:hasSubClass term:Term . # show names of these properties in grid exp:gender meta:css "show" . syn:frame meta:css "show term" . gf:lin meta:css "show term" . # hide values of these properties from grid rdf:type meta:css "hide" . meta:source meta:css "hide" . term:hasReferent meta:css "hide" .

Here is a sample command line to produce the term grid layout.

factor -F --indent --tree --skin=tf2html2 --template=gft --format=HTML edibleframe.ttl > edibleframe.html

This is what the term grid layout looks like.

Show/hide term grid

term grid

Once the entries have the requisite grammar information, the TF term ontology are written into a GF lexicon using the TermFactory GF lexicon writer .

Import the entries to mediawiki

Edit the entries in mediawiki

Standalone equivalents editor

A standalone javascript editor for locating and editing equivalents was written in the MOLTO project.

Changes made to TF entries in the equivalents editor can be carried back to a matrix TF ontology maintaining those terms using the TF web API. A RESTful solution is this. First convert the json back to TF using a query like

http://localhost:8080/TermFactory/query?url=changes.json&schema=TFS.owl&rw=relabel&f=TURTLE

The result is saved in the default TF database. Then use the following query to commit the equivalents to the matrix ontology.

http://localhost:8080/TermFactory/edit?m=matrix&d=original&a=changes.json

Here, matrix stands for the URL of the matrix ontology, original to the previous version of the equivalents and changes.json is the database location of the new converted equivalents ontology. As a result, an updated copy of the matrix ontology appears in the database, from which it can be fetched to TermFactory Wiki.

Sparql query select-concepts-missing-lion-by-lang.sparql produces a table of concepts missing localizations in a given group of languages. With JSON output setting, it produces a JSON table that can be read into the equivalents editor. It uses the projection syntax extension of SPARQL 1.1, see here ).

The localize.perl script

The command line script localize.perl is a wrapper to pellet4tf SPARQL localization query localize.sparql . It accepts optional parameters for localization language, the type of entries to localize, the source ontology to use, and the output format. The following example produces a localization file in JSON for property names in Finnish using a localization vocabulary on http://tfs.cc . The defaults for the options and the command run by the script are shown on the second example. Warning: the script creates a temporary file by name tmp.sparql in the directory where it is run. Examples:

> localize.perl -h # write localization file for given LANG and TYPE # usage: localize -l LANG -t TYPE -e engine -B bridge -f format -q script file... # example: localize -l fi -t meta:Property # default: localize # localizes TF schema vocabulary # help: localize -h > localize -l fi -t meta:Property -f JSON http://tfs.cc/owl/fi-TFS.owl localize TF properties in Finnish > localize -t bio:Country -e Stacked -B biobridge.ttl biocaster.ttl

Option -h prints usage and exits. The plain script name without arguments localizes the TermFactory vocabulary in TFS.owl. Option -l LANG only localizes to LANG. Option -t TYPE only localizes instances of TYPE. Option -e chooses the Pellet4TF engine to use. Option -B sets bridge for the Pellet4TF Stacked engine. Option -q chooses the localization script to use (the default is $TF_HOME/etc/scripts/lion.sparql. The remaining arguments are localization ontologies (the default is $TF_HOME/owl/tf-TFS.owl). There is an analogous script $TF_HOME/etc/scripts/translate for the TFTop schema.

Another script script/localize2.perl completes a given TermFactory json localization table for a given language using a localization vocabulary and a schema ontology. The script reads in a partially completed localization table as produced by localize , and uses the localization and schema ontologies to fill out the gaps, looking for stopgap localization strings for those concepts that have no label in the table. The script looks for translations in the localization vocabulary using sign properties defined in the schema ontology. If there is no exact label for a given concept in the localization vocabulary, the script looks for an as yet unused label for one of its superclasses. For instance, if an equivalent for "black adder" is missing for a given language, it may propose the translation for the superclass "viper". Examples:

> localize2 -h # use TFLocalizer to fill out json localization file for given lang schema and localization vocabulary # usage: localize2 -g -s schema -L lion -l lang *.json # example: localize2 -s TFTop.owl -L tf-TFS.owl -l fi ../io/fi0.json # help: localize2 -h > cat fi.json | tail { "inst": { "type": "uri" , "value": "http://tfs.cc/exp/Mandarin" } , "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string" , "type": "typed-literal" , "value": "fi" } , } , { "inst": { "type": "uri" , "value": "http://tfs.cc/exp/Taiwanese" } , "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string" , "type": "typed-literal" , "value": "fi" } , } ] } } > localize2 -s TFTop.owl -L tf-TFS.owl -l fi ../io/fi.json | tail ... { "inst": { "type": "uri", "value": "http://tfs.cc/exp/Mandarin" }, "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "typed-literal", "value": "fi" }, "base": "kiina", "hyper": "http://tfs.cc/exp/Chinese" }, { "inst": { "type": "uri", "value": "http://tfs.cc/exp/Taiwanese" }, "lang": { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "typed-literal", "value": "fi" }, "base": "kiina", "hyper": "http://tfs.cc/exp/Chinese" } ]} }

Here Mandarin and Taiwanese do not have a direct Finnish equivalent in the localization vocabulary tf-TFS.owl) but both get localized by its hypernym kiina 'Chinese'. Setting switch -g ('g' for 'global') prevents using the same hypernym twice, which guarantees a one-to-one localization/globalization relation but may not fill as many gaps.

List of TF options

A active B bridge C conf D describe E edits F file G describe-query H lynx I interpretation J joiner K skin L lion M offset N limit O original P pack Q queryText R root S schema T template U uri V echo W factor X across Y style Z timestamp a readAll b writeAll c tree d depth e engine f format g glob h help i input j joiner k keep l lang m merge n notry o out p prefixes q queryLine r repo s source t time u url v level w job x over y try z extension

Survey

This section surveys third party tools that can be applied to TermFactory. The list is in no way complete nor always up to date.

Editors

XML editors

Ontologies in RDF/XML and TF entries in HTML, TBX or any other XML format can be syntactically edited using any XML editor. The advantage may be that XML can be told to be pedantic about syntax, and there are a lot more tools for XML than for RDF or OWL. The disadvantage is that XML cannot check the semantics.

XMLmind XML editor

The XMLmind XML Editor from Pixware is a customisable XML editor written in Java which allows to edit large, complex, modular, XML documents in a structured WYSIWYG mode, i.e. the page looks uncluttered but you control what you do as precisely as if you were using an XML programmer's editor. XML can be rendered to a large variety of formats using the open source Formatting Objects toolchain. It natively supports MathML 2 Presentation Markup. XXE is highly customizable, without programming, by local gurus and consultants.

The TermFactory toolkit includes a plugin (configuration addon) for the free XML document editor XMLmind which allows structured editing of the LISA Oscar TBX terminology format in WYSIWYG mode. The TermFactory toolkit includes a plugin (configuration addon) for the free XML document editor XMLmind which allows structured editing of MultiTerm xml export format in WYSIWYG mode. The XMLmind XML editor user interface has been localized into Finnish.

It would not be difficult to construct a XMLmind skin for editing TF HTML entries offline with XMLmind.

RDF and OWL editors

There is quite a selection of ontology editors to choose from. Listings are maintained at W3C , Wikipedia , and other places. A small sample is surveyed below.

Swoop

MINDSWAP Swoop OWL editor (version 2.3beta4 in 2007, no longer under development) was something like a testbed for the pellet reasoner. It still finds some use as a more or less direct graphical interface to pellet services. It has a funny venn diagram visualiser, an ontology partitioner which seems to only do proper partitions, so it may not be able to split a well-connected ontology. It has a query evaluator, but only for RQL queries. Keep in mind as a model or source for editing functionality in TF.

Protege

Protégé is a free, open source ontology editor and a knowledge acquisition system. Like Eclipse, Protégé is a framework for which various other projects suggest plugins. This application is written in Java and heavily uses Swing to create the rather complex user interface. Protege recently has over 100,000 registered users. Protégé is being developed at Stanford University in collaboration with the University of Manchester. Version 3 was developed at Stanford and has the most contributed extensions. Version 4 is being develped by Manchester University and is build on the Java Eclipse IDE. It supports OWL 2.0. A side-by-side comparison of Protege 3 and 4 is found in the Protege Wiki .

(version 0.4) The current version of Protege used in TF development is version 4. It is downloadable from http://protege.stanford.edu/ .

Protege 4 (as of version 4.1, Aug 2011) has limited support for working with anonymous individuals. There are no editing facilities for anonymous individuals. There are several places where anonymous individuals don't show up in the Protege 4.1 graphical interface. This makes working with anonymous individuals difficult.

Protege plugins

There are many contributed plugins to Protege 3 and 4 , of which the following deserve mention here.

  • Collaborative Protege (3)
  • ProSE (4)
Collaborative Protege

Collaborative Protege is an extension of Protege 3 that supports collaborative ontology editing. (It is not known if or when Collaborative Protege comes to Protege 4.) In addition to the common ontology editing operations, it enables annotation of both ontology components and ontology changes. It supports the searching and filtering of user annotations, also known as notes, based on different criteria. Collaborative Protege implements two types of voting mechanisms that can be used for voting of change proposals. Multiple users may edit the same ontology at the same time. In multi-user mode, all changes made by one user are seen immediately by other users. There are two working modes available for Collaborative Protege. Both modes support multiple users working on an ontology:

  1. The multi-user mode - allows multiple clients to edit simultaneously the same ontology hosted on a Protege server. All changes made by one client are immediately visible by other clients. This mode is also referred to as client-server mode, or concurrent mode and requires a client-server setup. This mode is based on the implementation of the multi-user Protege and is the preferred mode in which Collaborative Protege should be run.
  2. The standalone mode - allows multiple users to access the same ontology in succession. The ontology can be stored on a shared network drive and all clients will access the same project files. However, simultaneous access is not possible. This mode is also referred to as the consecutive mode.

The next figure shows the collaborative protege graphical user interface.

Show/hide Collaborative Protege

Collaborative Protege
ProSE

The ProSE plugin guides a human user in choosing what to import from an ontology into another. In particular, it helps the user choose big enough a subset of the external ontology so that the risk of unintentionally indirectly asserting new relations between the imported concepts is minimised. It also helps make sure that the imported subset is no bigger than required for that purpose. In the TF scenaario, one may or may not want to enrich relations between the imported concepts. depending on the case at hand. When needed, the ProSE plugin can be used to determine a suitable set to import.

TopBraid Suite

As part of TopBraid Suite, Composer incorporates a flexible and extensible framework with a published API for developing semantic client/server or browser-based solutions, that can integrate disparate applications and data sources.

Implemented as an Eclipse plug-in, Composer serves as a development environment for TopBraid Ensemble™ and for all the applications delivered using TopBraid Live™. Composer is used to develop ontology models, configure data source integration as well as to customize dynamic forms and reports.

Two versions are available - Standard Edition and Maestro Edition.

NeON Toolkit

NeOn is a 14.7 million Euros project involving 14 European partners and co-funded by the European Commission’s Sixth Framework Programme under grant number IST-2005-027595. NeOn started in March 2006 and has a duration of 4 years. Our aim is to advance the state of the art in using ontologies for large-scale semantic applications in the distributed organizations. Particularly, we aim at improving the capability to handle multiple networked ontologies that exist in a particular context, are created collaboratively, and might be highly dynamic and constantly evolving.

The first release of the NeOn Toolkit, one of the core outcomes of the NeOn project, is available for download and testing from the NeOn Toolkit and Community site.

(From NeOn Wiki:) The NeOn toolkit is a state-of-the-art, open source multi-platform ontology engineering environment, which aims to provide comprehensive support for all activities in the ontology engineering life-cycle. The toolkit is based on the Eclipse platform, a leading development environment. The toolkit provides an extensive set of plug-ins (currently 45 plug-ins are available) covering all aspects of ontology engineering, including:

  • relational database integration
  • modularisation
  • visualisation
  • alignment
  • project management

The NeOn Toolkit is part of the reference implementation of the NeOn architecture. The major goal of the NeOn project is to provide methodology, infrastructure and tools for designing and managing a new generation of knowledge-intensive semantic applications. The toolkit is implemented as an Eclipse application.

NeOn DIG plugin

A DIG reasoner interface plugin is advertised for NeOn as follows:

The purpose of the DIG Plugin is the implementation of the DIG Interface version 1.1. The DIG Description Logics Interface Version 1.1 is a specification for defining a new interface for DL Systems. It is effectively an XML Schema for a DL concept language along with ask/tell functionality. In two words, the DIG interface provides a standardized way to access and query a reasoner. The user initially chooses the desired action she is interested in. Then the ontology is translated into the DIG interface and sent to the reasoner along with the queries that have been posed. After the query processing inside the reasoner has taken place, the reasoner sends back to the user the response encoded in the DIG Interface and the user can extract the answer to her query. The interested reader is referred to the protocol specification for further information.

The entire concept language and tell/ask functionality is enough to capture every functionality that is usually provided by a reasoner. In the context of the use cases in the NeOn-Project the reasoning tasks that are of utmost importance to us are ontology coherency and classification. More concretely:

  • Ontology Coherency: The reasoner takes as input the (translated into the DIG protocol) ontology and for each concept it returns true or false, depending on whether the corresponding concept is satisfiable or unsatisfiable, respectively.
  • Classification: The reasoner takes as input the (translated into the DIG protocol) ontology and returns the inferred classification of the various concepts, as opposed to the explicit one that the user initially sees. Moreover, for every concept in the inferred hierarchy we get whether it is satisfiable or not.

It must be highlighted at this point that the DIG Description Logics Interface that has been partially implemented in this plugin does not come bound to any reasoner. Contrary to that, it only provides an interface to query any reasoner that supports the DIG Protocol. The motivation behind this is that the current plugin is a general-purpose plugin that is intended to be used with different reasoners, so we found it meaningful not to restrict it to a particular reasoner, but to rather provide the user with the flexibility to do that themselves, in full accordance with their needs.

OwlSight

OwlSight is a web based ontology browser from ClarkParsia that uses the Pellet reasoner. The browser is written with the Google Web Toolkit and it uses the OWL API to access ontologies.

Ontology Browser

Ontology Browser is a tool to dynamically gneate documentation for ontologies, based on the OWLDoc software. It uses OWL API to access ontologies and has an inteface to Fact++ reasoner.

OntoTrack

OntoTrack is a browsing and editing tool for OWL ontologies developed at Ulm University using the OWL API.

OntoWiki
OntoWiki is a tool providing support for agile, distributed knowledge engineering scenarios.

OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents.

Comparing OWL editors for TF

We have tried three java Eclipse based OWL editors with TF ontologies. Here are some of our observations:

The combined PULS/BioCaster epidemic ontology contains about 3000 classes (owl:Class 2709) and 20K instances (owl:Thing 22286). There are about 300K asserted triples. The OWLIM reasoner adds another 600K so the inference closure contains about 1M triples.

TopBraid loads the ontology in 00 seconds. It takes a couple of minutes to form the closure with OWLIMSwift. TopBraid feels triple oriented. It seems reasonably robust and provides some useful services (e.g. instance statistics exported in spreadsheet form).

NeOn toolkit takes several minutes to load epi.owl (downloading from the web). Most of the time is reported as spent reading RDF triples and building indices. The class editor is comparable to Protege. The individual editor does not create hyperlinks to object property objects. Reasoners are not mentioned on the NeOn Toolkit GUI (Version: 1.2.3 Build id: B1023 (2009-07-30)). NeOn has little built-in support for browsing and editing TF ontologies.

Protege 4 (version 4.0.114) is has fewer features than Protege 3, but best suited of the three for browsing TF ontologies, thanks to bundled reasoner support and hyperlinks.

For Protege 4 help, see http://protegewiki.stanford.edu/wiki/Protege4UserDocs .

Validators

A validator is a computer program used to check the validity or syntactical correctness of a fragment of code or document. The term is commonly used in the context of validating HTML, CSS and XML documents or RSS feeds though it can be used for any defined format or language. A reasoner (semantic reasoner, reasoning engine, rules engine) is a piece of software able to infer logical consequences from a set of asserted facts or axioms.

TF validation can be done using XML validators, RDF validators OWL syntax validators, OWL semantic reasoners, and TermFactory special built tools (see TF3 ). It is advisable to check third party ontologies with a validator before conversion to spot coding errors in incoming data.

W3C RDF validation service

The W3C RDF validation service is based on Another RDF Parser (ARP). The service supports the Last Call Working Draft specifications issued by the RDF Core Working Group, including datatypes. The service does not do any RDF Schema Specification validation. Note that other online RDF validation services are available.

WonderWeb OWL validator

WonderWeb OWL validator can be used to check the conformance of TF ontologies to the OWL 1.0 standard. TF adheres to OWL DL in order to benefit from ontology reasoners.

rdf:about

There is an RDF validator at http://www.rdfabout.com/demo/validator/index.xpd .

Validation with reasoners

the Pellet reasoner offers various tools for ontology validation. In particular, pellet consistency and pellet explain are handy:

pellet help consistency PelletConsistency: Check the consistency of an ontology Usage: pellet consistency [options] <file URI>... Argument description: --help, -h Print this message --verbose, -v Print full stack trace for errors. --config, -C (configuration file) Use the selected configuration file --loader, -l (Jena | OWLAPI | OWLAPIv3 | KRSS) Use Jena, OWLAPI, OWLAPIv3 or KRSS to load the ontology (Default: OWLAPIv3) --ignore-imports Ignore imported ontologies --input-format (RDF/XML | Turtle | N-Triples | N-Quads ) Format of the input file (valid only for the Jena loader). Default behaviour is to guess the input format based on the file extension. pellet help explain PelletExplain: Explains one or more inferences in a given ontology including ontology inconsistency Usage: pellet explain [options] <file URI>... The options --unsat, --all-unsat, --inconsistent, --subclass, --hierarchy, and --instance are mutually exclusive. By default --inconsistent option is assumed. In the following descriptions C, D, and i can be URIs or local names. Argument description: --help, -h Print this message --verbose, -v Print detailed exceptions and messages about the progress --config, -C (configuration file) Use the selected configuration file --ignore-imports Ignore imported ontologies --unsat (C) Explain why the given class is unsatisfiable --all-unsat Explain all unsatisfiable classes --inconsistent Explain why the ontology is inconsistent --hierarchy Print all explanations for the class hierarchy --subclass (C,D) Explain why C is a subclass of D --instance (i,C) Explain why i is an instance of C --property-value (s,p,o) Explain why s has value o for property p --method, -m (glass | black) Method that will be used to generate explanations (Default: glass) --max, -x (positive integer) Maximum number of generated explanations for each inference (Default: 1)

Reasoners

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining and backward chaining.

This section documents ontology query and rule languages and reasoners that may be used in TF. For a survey see Sattler .

Jena ARQ query engine for the SPARQL query language

ARQ is an open source SPARQL language query engine over RDF graphs implemented in Java.

Jena ARQ engine implements DESCRIBE queries in a fixed way (it uses a one-level, all-properties query). The result set is different and usually smaller than from TF DESCRIBE. Jena ARQ reads RDF models, so does not handle ontology imports (imports are an OWL construct). On the other hand, arq command-line accepts multiple data arguments.

Jena ARQ provides a number of extension functions for manipulating URIs and stuff. They are listed in here .

Fact++

FaCT++ is the new generation of the well-known FaCT OWL-DL reasoner. FaCT++ uses the established FaCT algorithms, but with a different internal architecture. FaCT++ is implementated using C++ in order to create a more efficient software tool, and to maximise portability. Fact++ is available in the Protege editor.

Pellet

Pellet is an open source reasoner for OWL 2 DL in Java. It provides standard and cutting-edge reasoning services for OWL ontologies. Pellet supports queries in SPARQL-DL , an OWL-DL query language that syntactically extends the SPARQL query language for RDF.

Pellet is an open source reasoner for OWL 2 DL in Java. It provides standard and cutting-edge reasoning services for OWL ontologies. Pellet is available in the Protege editor.

The latest (last?) stable version of Pellet is pellet-2.3.0 (Aug 22, 2011). After that, clark&parsia went commercial with stardog, a sparql 1.1 database with OWL2 reasoner. The current version is 1.1.2 (08 Jan 2013). Stardog is a fast, commercial RDF database: SPARQL for queries; OWL for reasoning; pure Java for the Enterprise. Not doing well, 400 followers. Stardog is targeted at query performance for complex SPARQL queries. Stardog has the deepest, most comprehensive, and best OWL reasoning support of any commerical RDF database available.

The current TF OWL reasoner is based on Pellet 2.3.0. clark&parsia has since then gone commercial with stardog, a sparql 1.1 database with OWL 2 support. The current version is 1.1.2 (08 Jan 2013). Stardog is targeted at query performance for complex SPARQL queries.

Since Pellet 2.0 Release (November 16, 2009), Pellet offers

  • full OWL 2 support (modulo a few bugs that will be fixed in the 2.1 release)
  • supports domain and range axioms, class expressions, qualified cardinality restrictions, literal constants, annotations, and nested class expressions in SPARQL queries
  • support for all SWRL builtins, including previously missing builtins (substring, tokenize, and optional precision parameters for roundHalfToEven)
  • optimized support for OWL 2 EL reasoning; OWL 2 EL reasoner is autoselected based on data input
  • supports automated ontology module extraction
  • supports incremental classification
  • supports fine-grained inference extraction
  • enhanced SWRL rules performance
  • OWLAPI v3 support
  • lots of improvements, cleanups to Pellet’s command line tools
  • updated to work with Jena 2.6.2 — Pellet is the only DL reasoner available from Jena
  • supports explanations via Jena
  • support autoselecting best SPARQL query engine based on input query
  • user-defined timeouts for reasoning
  • switch to dual license model to support commercial and open source projects

This release marks a change in Pellet development process: starting with 2.1, Pellet will be released according to a time-based development cycle. We will do four quarterly releases per year. We will make point releases between the quarterly releases, as necessary, to fix critical bugs only. Thus, the release schedule for the 2.x series will be 29 March 2010, 28 June 2010, 27 September 2010, 20 December 2010.

The following notes are from pellet.owldl.com/downloads/pellet-tutorial.pdf .

Pellet can be used via three different APIs

  • Internal Pellet API
  • Manchester OWLAPI
  • Jena API

Each API has pros and cons. Choice will depend on your applications’ needs and requirements.

Pellet Internal API
  • API used by the reasoner
  • Designed for efficiency, not usability
  • Uses ATerm library for representing terms
  • Fine-grained control over reasoning
  • Misses features (e.g. parsing & serialization)
  • Pros: Efficiency, fine-grained control
  • Cons: Low usability, missing features
Manchester OWLAPI
  • API designed for OWL
  • Closely tied to OWL structural specification
  • Support for many syntaxes (RDF/XML, OWL/XML, OWL functional, Turtle, ...)
  • Native SWRL support
  • Integration with reasoners
  • Support for modularity and explanations
  • Pros: OWL-centric API
  • Cons: Not as stable, no SPARQL support
  • More info: http://owlapi.sf.net
Jena API
  • RDF framework developed by HP labs
  • An RDF API with OWL extensions
  • In-memory and persistent storage
  • Built-in rule reasoners and integrated with Pellet
  • SPARQL query engine
  • Pros: Mature and stable and ubiquitous
  • Cons: Not great for handling OWL, no specific OWL 2 support
  • More info: http://jena.sf.net
Advanced Pellet programming

Main processing/reasoning steps:

  1. Loading data from Jena to Pellet
  2. Consistency checking
  3. Classification [Optional]
    • Compute subClassOf and equivalentClass
    • inferences between all named classes
  4. Realization [Optional]
    • Compute instances for all named classes

Steps should be performed in the given order. No need to repeat any of the steps unless the underlying data changes. Loading and consistency checking mandatory. Classification and realization optional, performed only if required by a query.

  • Queries triggering classification:
    • Querying for equivalent classes
    • Querying for (direct or all) sub/super classes
    • Querying for disjoint/complement classes
  • Queries triggering realization:
    • Querying for direct instances of a class
    • Querying for (direct or all) types of an individual

An axiom can be interpreted with open world assumption (OWA) - regular OWL axiom, or closed world assumption (CWA) - integrity constraint (IC). How to use ICs in OWL? Two easy steps:

  1. Specify which axioms should be ICs
  2. Validate ICs with Pellet
  • Ontology developer
    • Develop ontology as usual
    • Separate ICs from regular axioms (annotation, separation of files, named graphs, ...)
  • Pellet IC validator
    • Translates ICs into SPARQL queries automatically
    • Execute SPARQL queries with Pellet
    • Query results show constraint violations

The Pellet reasoner can be used to query TF ontologies as follows.

pellet.sh query -q query.sparql $TF_HOME/cnv/bio/bc2e.owl

Here is an example query:

# Give me all items that are members of DISEASE and # tell me all classes they belong to. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX bc: <http://biocaster.nii.ac.jp/biocaster#> CONSTRUCT { ?X rdf:type ?C } WHERE { ?X rdf:type bc:DISEASE . ?X rdf:type ?C . }

The result of the query is a sub-ontology which reveals that the BioCaster ontology makes some category errors, in particular, these triples:

<rdf:Description rdf:about="http://biocaster.nii.ac.jp/biocaster1#NON_HUMAN_138"> <rdf:type rdf:resource="http://biocaster.nii.ac.jp/biocaster#DISEASE_24"/> </rdf:Description> <rdf:Description rdf:about="http://biocaster.nii.ac.jp/biocaster1#DISEASE_49"> <rdf:type rdf:resource="http://biocaster.nii.ac.jp/biocaster#SubCountry"/> </rdf:Description>

Pellet does not support DESCRIBE queries. The PELLET engine only queries graph patterns. The Mixed engine uses the Pellet engine to do the graph pattern part of a query and Jena ARQ for other SPARQL constructs. The Pellet ARQ engine option loads an ontology into a pellet kb and runs the ARQ engine on triples retrieved by the Pellet reasoner.

Early on, we considered adapting the Pellet SPARQL-DL reasoner / query engine to the TF repository network so that it can carry out relayed queries in a network of T