I have promised to give some examples of semantic errors in Linked Data. As already mentioned, semantic errors are easy to make, and hard to detect.
Syntax errors can become semantic statements
My first advice is to always use an ontology editor for syntax validation. Feel free to write RDF in a text editor, but check it in an ontology editor. Because most of the time syntax errors are detected and the tool stops opening of the file:
But do your check meticulously! Syntax errors may turn into semantic statements!
For example, when you create a Linked Data file with a text editor, or even a RDF conversion tool like TARQL ,and your finger slips while typing and you mistakenly write rdf:typer instead of rdf:type
:Z rdf:typer owl:Class .
..and you decide to do a syntax check with an ontology editor like for example Topbraid Composer (TBC). You will be surprised that TBC does not detect this as a syntax error, because in open world it isn’t. TBC shows it as a normal predicate to :Z in the Resource form:
..and assumes that you have created it as your own local reference to something in the RDF schema. The only way you can see that this class is different, and perhaps specified wrong, is the icon it gets in the user interface:
The globe icon in TBC means that it cannot find a rdf:type description of the element, and it is assumed that this is present somewhere on the web (hence the globe icon..).
In a closed world a typo like this would instantly be recognised as a syntax error. But since we are developing for an open world it now becomes a semantic statement!
Incorrect use of vocabulary definitions
A semantic error is most of the time caused by incorrect use of any vocabulary. To prevent these kinds of errors the vocabulary documentation as a whole must be consulted before using it. The vocabulary definition must be seen as a standard. Just to browse through the classes and properties and know their name is not enough to know how to apply semantics properly.
Applying the wrong type of data to the range of a property
Somebody wants to specify that “The director of the movie StarWars is George Lucas”. The schema.org vocabulary is used.
example of wrong use of vocabulary:
:StarWars schema:director "George Lucas"^^xsd:string .
This example shows a statement that is wrong because a string is used for the range of the property “director”. According to the vocabulary specification this must be an instance of the type schema:Person .
:StarWars schema:director :GeorgeLucas . :GeorgeLucas rdf:type schema:Person .
Incorrect use of the meaning of an entity
Somebody wants to say: “The summer season starts at June 21st”
example of wrong use of vocabulary:
:Summer rdf:type schema:Season ; :startsOn "2014-06-21^^xsd:date .
What is wrong here is that is assumed that “Season” refers to the part of the year. But schema.org defines schema:Season as a series of episodes of a TV or radio show. You cannot use the schema:Season class to describe this sentence, even though the word is a homonym. The schema.org vocabulary has specified properties to the class that refer to something that is a TV or radio show. Every thing on the web that is described with rdf:type schema:Season is just that.
Correct is to define your own Season class:
:Season rdf:type owl:Class . :Summer rdf:type :Season .
or use the Season class from another ontology (e.g. http://www.ontotext.com/proton/protonext#Season )
:Summer rdf:type pext:Season ; :startsOn "2014-06-21^^xsd:date .
One more of these type of errors:
Properties are used as classes and vice-versa
Some vocabularies have a property with a name that could be interpreted for a class. Dublin Core is one of those. For “persons” Dublin Core has the more general class “Agent”. So anyone who is looking for something that is a person and just looks shallowly through the list of available entities could come up with dc:creator or dc:publisher. These are properties, and can never be used as an object in a triple (OWL Full perhaps excluded). So:
:GeorgeLucas rdf:type dc:creator .
is incorrect semantics. Not picked up as such by TBC when you write it in text editor and open in TBC..this triple is there but invisible in the user interface!
These kinds of errors are easily made with LODRefine. This tool is used for mapping tabular data to RDF, however it does not check what range is required, whether you should apply a property or a class or what the exact meaning of any entity is.
Less obvious semantic errors are logical entailment consequences.
Probably one of the most difficult semantic errors to understand and to detect are logical entailment errors. Logical entailed statements are the logical consequence results of other statements. The RDFS and OWL schemas (a necessary building block of Linked Data) are full of logic. To generate these consequence results one has to execute a set of rules via an inference engine or reasoner. Basically, if no such rules are executed on the data, the semantic correctness of this data is unknown.
Now there’s a lot to say about this. That it is impractical, impossible (e.g. what happens to the semantic correctness if you combine Linked Data sets) and all that. The point is, that you should be aware of the logic in RDFS/OWL and be aware that you might be creating something that is semantically wrong or even inconsistent. As far as I am concerned for creating Linked Data it is not necessary to study in depth RDFS/OWL logics. I think it is sufficient that you know just enough about reasoning to avoid these kind of errors. Just enough is at least this:
The value that you specify for domain or range of a property determines to what class an instance that holds that property belongs. For example, when I say that the property hasWheel has domain Car and range Wheel:
:hasWheel a owl:ObjectProperty ; rdfs:domain :Car ; rdfs:label "has wheel "@en ; rdfs:range :Wheel .
..and I create an instance of “Thing” that has a Wheel, here SomeWheel
:MyThing a owl:Thing ; rdfs:label "My thing "@en ; :hasWheel :SomeWheel .
..and I run the reasoner, then the instance not only belongs to class Thing, but also to class Car .
:MyThing a owl:Thing , :Car ; rdfs:label "My thing "@en ; :hasWheel :SomeWheel .
This could lead to unwanted results, especially when more than one value is used in domain and range. It is best to always test this with a reasoner, or to refrain from using domain and range in properties.
What happens if you just publish and do not run a reasoner? Then somebody else could run a reasoner on your data and be confused or mislead by the statements that can be drawn from your data.