Over the years I have created and worked with ontologies in a diverse broad of domains. Amongst them water management, food safety, construction and education. I am not exactly an expert on these particular domains..I build ontologies. So for a detailed description of a particular domain, only subject matter experts (SME) know exactly how to describe the Things. When it comes to arranging the taxonomy tree and describing the concept relations we have to work together. Usually the SME’s are not educated in OWL programming and here is the issue that I want to address in this post: we need an instrument that helps us to understand what we mean!
Object Oriented information analysts normally use UML as an instrument to model and communicate with SMEs. The obvious thing to do would be using UML for modelling and transform the UML model into OWL. Actually, this is not as easy as it seems. This issue has been a subject of many research projects. The OMG itself has drawn an Ontology Definition Metamodel but this has not been widely adopted by OWL software development vendors.
OWL modellers often use the graph model to illustrate what has been modelled. This kind of illustration does not allow for depiction of constructs that are a bit more more complex than class – relation – class. Graphical modelling of RDFS could be done like this, but when it comes to OWL you really need an instrument that can illustrate more complex constructs like cardinality, transitivity and the like.
There is a tool that can help out here, it is called Fluent Editor. It is from Cognitum. I have known this product for a couple of years, and loved it. Now comes the even better news.. the developers of Fluent Editor (FE) have added many fantastic new features recently that you just have to see.
So what exactly is FE?
FE is an ontology development tool that works with natural language. You enter the body of knowledge into FE via ControlledNatural Language (CNL), and in the background FE parses these sentences into OWL! Everybody understands CNL, since it is very close to natural language, yet structured. ..so gone is the communication gap between SME’s and the OWL modeller.
This is how the editor looks (click to enlarge).
Plus: FE has a reasoning engine that everybody can understand. You enter natural language in the interface and in the background FE executes DL or RL reasoning to present the results in natural language. (click image to enlarge)
OWL modellers do not have to be disappointed that they are put out of a job..of course the OWL that is generated has to be verified by them, and to support this FE has a number of features:
complex expressions (e.g. value partitions, disjoint )
modal expressions (must, should, can and their negations)
OWL/XML notation preview
RDF/XML notation preview
syncing with Protégé (click image to enlarge)
integration with R for, amongst others, drawing the ontology graph
connecting to Ontorion server (which is the collaborative version)
I can only say…..you have to check it out and you will be surprised by the capabilities of the product.
I have promised to give some examples of semantic errors in Linked Data. As already mentioned, semantic errors are easy to make, and hard to detect.
Syntax errors can become semantic statements
My first advice is to always use an ontology editor for syntax validation. Feel free to write RDF in a text editor, but check it in an ontology editor. Because most of the time syntax errors are detected and the tool stops opening of the file:
But do your check meticulously! Syntax errors may turn into semantic statements!
For example, when you create a Linked Data file with a text editor, or even a RDF conversion tool like TARQL ,and your finger slips while typing and you mistakenly write rdf:typer instead of rdf:type
:Z rdf:typer owl:Class .
..and you decide to do a syntax check with an ontology editor like for example Topbraid Composer (TBC). You will be surprised that TBC does not detect this as a syntax error, because in open world it isn’t. TBC shows it as a normal predicate to :Z in the Resource form:
..and assumes that you have created it as your own local reference to something in the RDF schema. The only way you can see that this class is different, and perhaps specified wrong, is the icon it gets in the user interface:
The globe icon in TBC means that it cannot find a rdf:type description of the element, and it is assumed that this is present somewhere on the web (hence the globe icon..).
In a closed world a typo like this would instantly be recognised as a syntax error. But since we are developing for an open world it now becomes a semantic statement!
Incorrect use of vocabulary definitions
A semantic error is most of the time caused by incorrect use of any vocabulary. To prevent these kinds of errors the vocabulary documentation as a whole must be consulted before using it. The vocabulary definition must be seen as a standard. Just to browse through the classes and properties and know their name is not enough to know how to apply semantics properly.
Applying the wrong type of data to the range of a property
Somebody wants to specify that “The director of the movie StarWars is George Lucas”. The schema.org vocabulary is used.
This example shows a statement that is wrong because a string is used for the range of the property “director”. According to the vocabulary specification this must be an instance of the type schema:Person .
What is wrong here is that is assumed that “Season” refers to the part of the year. But schema.org defines schema:Season as a series of episodes of a TV or radio show. You cannot use the schema:Season class to describe this sentence, even though the word is a homonym. The schema.org vocabulary has specified properties to the class that refer to something that is a TV or radio show. Every thing on the web that is described with rdf:type schema:Season is just that.
Some vocabularies have a property with a name that could be interpreted for a class. Dublin Core is one of those. For “persons” Dublin Core has the more general class “Agent”. So anyone who is looking for something that is a person and just looks shallowly through the list of available entities could come up with dc:creator or dc:publisher. These are properties, and can never be used as an object in a triple (OWL Full perhaps excluded). So:
:GeorgeLucas rdf:type dc:creator .
is incorrect semantics. Not picked up as such by TBC when you write it in text editor and open in TBC..this triple is there but invisible in the user interface!
These kinds of errors are easily made with LODRefine. This tool is used for mapping tabular data to RDF, however it does not check what range is required, whether you should apply a property or a class or what the exact meaning of any entity is.
Less obvious semantic errors are logical entailment consequences.
Probably one of the most difficult semantic errors to understand and to detect are logical entailment errors. Logical entailed statements are the logical consequence results of other statements. The RDFS and OWL schemas (a necessary building block of Linked Data) are full of logic. To generate these consequence results one has to execute a set of rules via an inference engine or reasoner. Basically, if no such rules are executed on the data, the semantic correctness of this data is unknown.
Now there’s a lot to say about this. That it is impractical, impossible (e.g. what happens to the semantic correctness if you combine Linked Data sets) and all that. The point is, that you should be aware of the logic in RDFS/OWL and be aware that you might be creating something that is semantically wrong or even inconsistent. As far as I am concerned for creating Linked Data it is not necessary to study in depth RDFS/OWL logics. I think it is sufficient that you know just enough about reasoning to avoid these kind of errors. Just enough is at least this:
The value that you specify for domain or range of a property determines to what class an instance that holds that property belongs. For example, when I say that the property hasWheel has domain Car and range Wheel:
..and I create an instance of “Thing” that has a Wheel, here SomeWheel
a owl:Thing ;
rdfs:label "My thing "@en ;
:hasWheel :SomeWheel .
..and I run the reasoner, then the instance not only belongs to class Thing, but also to class Car .
a owl:Thing , :Car ;
rdfs:label "My thing "@en ;
:hasWheel :SomeWheel .
This could lead to unwanted results, especially when more than one value is used in domain and range. It is best to always test this with a reasoner, or to refrain from using domain and range in properties.
What happens if you just publish and do not run a reasoner? Then somebody else could run a reasoner on your data and be confused or mislead by the statements that can be drawn from your data.
In a previous post I express my worries about the fact that the importance of semantics is sometimes neglected. In my opinion this creates messy Linked Data that is useless when it comes to interoperability.
There are syntax errors and semantic errors. A syntax error happens for example when you forget to end a Turtle statement with a period. Linked Data tools such as ontology editors and triple stores will prevent you from making these kind of mistakes. However if you choose to write RDF with a text editor and publish it as a file on a web server (which is perfectly fine) , you are on your own.
A semantic error is a mistake that you make while working with the vocabulary that describes the data. When Linked Data is semantically incorrect, it is not fit to be combined with other Linked Data. SPARQL queries do not work, data is put in the wrong places. It is like merging two spreadsheets and not taking care of the column description..
It is very easy to create a semantic error, and the bad news is that it is kind of hard to detect them. Why is this? Because the tools were not designed to be restrictive! This might seem a little strange at first but because of AAA and the fact that we are working in an open world, the tools allow you maximum freedom to create. And they assume that what you create is intended and correct.
The lesson here is that you have to be aware of this, and really, really understand the tools and the vocabularies that you are working with. Yes, this is hard work, but challenging and therefor fun! But if you don’t take the time and effort, you are creating Linked Data that does not make sense to the world, only to you…another silo..