About AAA and messy Linked Data

One of the most powerful selling points of Linked Data is the AAA concept: Anybody can say Anything about Any Topic. This concept is nice because it gives a sense of freedom. “I do not have to work with data or models that do not completely satisfy my requirements, but I can use parts and and add my own to it.” Indeed that is exactly what is beautiful about it. No one can provide information that works well for everybody. Because of cultural differences, point of views, different use cases it is necessary that different sources describe the same topic.

However.. the AAA concept should not encourage people to see it as a license to desist from applying standards. After all, standards in general are created to warrant interoperability. The whole purpose of creating Linked Data is to provide interoperable data! Both syntactic and semantic.

What I see is that syntactic interoperability in Linked Data is not an issue, people apply these standards fine. The issue is semantic interoperability. The whole point of creating semantically correct Linked Data is sometimes neglected or ignored, and often misunderstood. Does it matter? Yes! Because if you don’t apply proper semantics, you create messy Linked Data and people who use it can’t rely on its meaning. That’s pretending to provide something interoperable but in reality you provide something that only you understand. Plus you are throwing away half of the potential of Linked Data. Simply a waste of technical resources.

 

 

 

 

Discover Linked Data in an endpoint

In this post I descibe a quick and easy way to discover what is available in a SPARQL endpoint.

For some people, SPARQL seems a little strange. It is actually very simple when you remember that the SPARQL query engine evaluates triple patterns. So, when you write a query the SPARQL engine responds with triples that match that pattern.
SPARQL does not only query data, it can also query the datamodel at the same time. A good approach to discover what is in a particular SPARQL endpoint is to query the datamodel first. Then you have an idea of the content of the repository.

To query what main topics there are in a repository you can execute this query. Be nice to the endpoint and LIMIT your query with a reasonable value for the amout of rows returned (I assume that the OWL schema is used. Some repositories also use the RDFS schema, so querying rdfs:Class might -also- be necessary) :

SELECT * 
WHERE {
       ?topic a owl:Class 
      }
LIMIT 10

Next you would perhaps like to know what data is associated with the classes. You extend the previous SPARQL query with a triple like this:

?data a ?topic .

So the total query is:

SELECT * 
WHERE {
      ?topic a owl:Class .
      ?data a ?topic .
      } 
LIMIT 100

Oops..you might get a lot of information now…For example when you are querying DBpedia...So perhaps it is time to use an actual value that you retrieved in your query, which is now for example only this statement:

SELECT * 
     WHERE {
             ?data a <http://dbpedia.org/ontology/Artist> .
           }

To find out what more is described about our data in property relations add:

 ?data ?prop ?val .

Or..using  real values in our DBpedia example, use only this statement in the WHERE clause:

<http://nl.dbpedia.org/resource/Sam_Most> ?prop ?val .

So the total general query is:

SELECT * 
WHERE {
       ?topic a owl:Class .
       ?data a ?topic .  
       ?data ?prop ?val .
      }

..not quite..there might be more! Since relations have a direction, it is possible that there are some incoming relations that we miss. So add the incoming relations as follows. This general query, as we have seen, is as a whole only useful in small repositories.

SELECT * 
WHERE {
      {?topic a owl:Class .
       ?data a ?topic .  
       ?data ?prop ?val .
      }
UNION {
      ?val ?prop ?data . 
      }
      }

 

Semantic Modelling part 2: “Red Ball” explained

So.. worked yourself through all the Red Ball modelling examples of my previous post? Now what..which one to choose? Let’s systematically discuss them all. To help yourself reading and comparing, open the modeling examples page and read on. To explain some terms I use I have made links to a thesaurus that is running on my demo machine (see also this post). This thesaurus cites from the Protege OWL tutorial.

To understand the difference between every option, remember that OWL was developed to create knowledge systems that are understood by humans and machines.  So to verify if you have created meaningful code: imagine that you are a machine, what is documented in the code that helps you understand the concept of a Red Ball?

Option 1 just declares a class with the name RedBall. This has meaning for humans..but for a machine this could be anything as similar as “Thing2”[1]. This could still be a valid modelling approach in a situation where other concepts are already described. In OWL it is allowed to include definitions from other sources into your model. In fact, this happens all the time. This code below helps the machine understand that a RedBall is in fact a type of Ball (which is declared in the example.com namespace) :

:RedBall
      a       owl:Class ;
      rdfs:subClassOf <http://example.com/toys/Ball> .

Option 2 defines RedBall as a specialisation of Ball. As a human I read it like this: a RedBall is a (kind of) Ball. So, when we know what a Ball is, we also know what a RedBall is. It is a Ball with some extra characteristics.

What does it mean in OWL[2] ? The property rdfs:subClassOf defines that (quote) if a class C is a subclass of a class C’, then all instances of C will also be instances of C’. (unquote). So when an instance of RedBall is declared, it is also an instance of Ball. Now it comes to the question: what does the machine know about a Ball? For everything that is known about a Ball will be inherited into the RedBall. And, since RedBall is a separate class, we can add extra meaning to it that is not valid for the parent class Ball.

In option 3 we created an instance of Ball with the name RedBall. An instance is different from a class because an instance is an existing thing (it is data) and a class is a conceptual thing (part of a data model – the semantic model). So we have told the machine that we have an actual Ball, and that its name is “RedBall”. As a human we now know that it has the colour Red (since this is stated in the name), but as a machine we do not know anything about RedBall except that it is an actual Ball.

For option 4 we use a construction with owl:DatatypeProperty . A datatype property assigns a value to an instance of a Thing. The value can be any xsd schema datatype, or a user defined datatype. By using a property, we are now finally starting to model with a bit more meaning. We are defining a property that can be used to add meaning to an instance of a class. In semantic web modeling you can define as many properties as you like, every property adding its bit of meaning. There are predefined properties that belong to an existing scheme (such as the RDFS scheme with the most used “rdfs:label” property, or the Dublin Core schemes) or you can specify your own. If you do create your own, remember that the property is the middle part of the RDF triple statement so it must be a verb (otherwise the statement cannot be read as natural language by humans, personally I consider this bad modelling). If a statement is modelled with a datatype property the object part cannot be further described (other than that it is of a xsd schema datatype). In our example “Red” cannot be further explained. In other words: the semantic network (graph) stops  at a datatype property.

When we model with an object property on the other hand, our option 5, we can describe what we mean by the colour red. Since by definition an object property links two instances of a class we have the opportunity to describe the subject and the object in detail.

Option 6 introduces a new schema, the SKOS schema. This schema is very useful when you want to describe the meaning of things in a simpel, yet very powerful manner. SKOS is meant for documenting thesauri and taxonomies. These are basically dictionaries (with properties such as related term, synonym, abbreviation, definition)  with generalistic relations (broader term, narrower term, related term etc.).

The SKOS scheme semantics is implemented in many inference engines. This gives you the opportunity to specify the necessary statements while the inference engine creates the logical consequence statements for you. Example: in your model you define that A skos:broader B. The inference engine will now generate B skos:narrower A for you. This saves you from a lot of data entry!

Also note that the SKOS concepts are instances  of the class skos:Concept. Instances are data. It is good practice to give data records a meaningless identifer. So  in the example I have given the  identifier the form C001 and C002. The skos:prefLabel property gives us the actual name of the instance.

The skos:narrower property describes the relation between “Ball” and “RedBall” as hierarchic. This (quote) indicates that one is in some way more general (“broader”) than the other (“narrower”) (unquote). It does not say hierarchic against which parameter! This could be anything like size, weight and indeed colour. To describe what a RedBall exactly is, and how it relates to other Things we need the power of the OWL schema.

Option 7 and 8 are two examples of OWL constructs. In OWL a Thing is assigned meaning via its properties.  These examples show how  owl:Restriction  can be used for describing (option 7) and defining (option 8) a RedBall. The difference between the two lies in the direction of the meaning of the statement. In option 7 the following is true: if something is a RedBall then it must have the value ColourRed for property hasColour and it must be of type Ball. The direction of the statement is one way. In option 8: if something hasColour ColourRed and it is a Ball then it must be a RedBall. This can be proven by an inference engine (also called a reasoner). The direction of the statement is bidirectional.

In the To Learn page of my blog are sources that give you in depth background information of the above. In particular the Protege sources and the book “Semantic Web for the Working Ontologist” by Hendler and Allemang.

[1] This must be Thing2 since Thing is the top concept of any ontology

[2] Actually, subClassOf is part of the RDF schema RDFS