Archive for March, 2008

how binary relations beat tuples

Last week I was handed a puzzle by Francois Bry: “Why does RDF limit itself to binary relations? Why this deliberate lack of expressivity?”.

Logical Equivalence Reply

My initial answer was that all tuples could be reduced to binary relations. So take a simple table like this:

User ID name address birthday course homepage
1234 Henry Story 21 rue Saint Honoré
Fontainebleau
France
29 July philosophy http://bblfish.net/
1235 Danny Ayers Loc. Mozzanella, 7
Castiglione di Garfagnana
Lucca
Italy
14 Jan semweb http://dannyayers.com

The first row in the above column can be expressed as a set of binary relations as shown in this graph:

The same can clearly be done for the second row.

Since the two models express equivalent information I would opt aesthetically for the graph over the tuples, since it requires less primitives, which tends to make things simpler and clearer. Perhaps that can already be seen in the way the above table is screaming out for refactoring: a person may easily have more than one homepage. Adding a new homepage relation is easy, doing this in a table is a lot less so.

But this line of argument will not convince a battle worn database administrator. Both systems do the same thing. One is widely deployed, the other not. So that is the end of the conversation. Furthermore it seems clear that retrieving a row in a table is quick and easy. If you need chunks of information to be together that beats the join that seems to be required in the graph version above. Pragmatics beats aesthetics hands down it seems.

Global Distributed Open Data

The database engineer might have won the battle, but he will not win the war [1]. Wars are fought at a much higher level, on a global scale. The problem the Semantic Web is attacking is global data, not local data. On the Semantic Web, the web is the database and data is distributed and linked together. On the Semantic Web use case the data won’t all be managed in one database by a few resource constrained superusers but distributed in different places and managed by the stake holder of that information. In our example we can imagine three stake holders of different pieces of information: Danny Ayers for his personal information, Me for mine, and the university for its course information. This information will then be available as resources on the web, returning different representations, which in one way or another may encode graphs such as the ones below. Note that duplication of information is a good thing in a distributed network.

By working with the most simple binary relations, it is easy to cut information up down to their most atomic unit, publish them anywhere on the web, distributing the responsibility to different owners. This atomic nature of relations also makes it easy to merge information again. Doing this with tuples would be unnecessarily complex. Binary relations are a consequence of taking the open world assumption seriously in a global space. By using Universal Resource Identifiers (URIs), it is possible for different documents to co-refer to the same entitities, and to link together entities in a global manner.

The Verbosity critique

Another line of attack similar to the first could be that rdf is just too verbose. Imagine the relation children which would relate a person to a list of their children. If one sticks just with binary relations this is going to be very awkward to write out. In a graph it would look like this.

image of a simple list as a graph

Which in Turtle would give something like this:

:Adam :children
     [ a rdf:List;
       rdf:first :joe;
       rdf:rest [ a rdf:List;
            rdf:first :jane;
            rdf:rest rdf:nil ];
     ] .

which clearly is a bit unnecessarily verbose. But that is not really a problem. One can, and Turtle has, developed a notation for writing out lists. So that one can write much more simply:

:Adam :children ( :joe :jane ) .

This is clearly much easier to read and write than the previous way (not to speak about the equivalent in rdf/xml). RDF is a structure developed at the semantic level. Different notations can be developed to express the same content. The reason it works is because it uses URIs to name things.

Efficiency Considerations

So what about the implementation question: with tables oft accessed data is closely gathered together. This it seems to me is an implementation issue. One can easily imagine RDF databases that would optimize the layout in memory of their data at run time in a Just in Time manner, depending on the queries received. Just as the Java JIT mechanism ends up in a overwhelming number of cases to be faster than hand crafted C, because the JIT can take advantage of local factors such as the memory available on the machine, the type of cpu, and other issues, which a statically compiled C binary cannot do. So in the case of the list structure shown above there is no reason why the database could not just place the :joe and jane in an array of pointers.

In any case, if one wants distributed decentralised data, there is no other way to do it. Pragamatism does have the last word.

Notes

  1. Don’t take the battle/war analogy too far please. Both DB technologies and Semantic Web ones can easily work together as demonstrated by tools such as D2RQ.
Read More..>>

The Kiwi has started

Last Week we have been at the Kick-Off-Meeting of the Kiwi Project in Salzburg, Austria. We’ve had a great time and getting the people together was quite a success.

Participants in that Project are:

We were discussing the things, that needed to be discussed and far beyond. But more important, we got to know each other. Not only project related, it is always a good thing to sit together have a fine dinner or/and a drink together. That makes a meeting like this to a success, we now should transfer this energy to the working process. Pictures can be found here. May the Kiwi fly high.

Read More..>>

KIWI Kickoff Meeting: Great team and great project

Wednesday to Friday, we officially started the KIWI project with a great kickoff meeting in Maria Plain close to Salzburg. From my perspective, the kickoff meeting was almost a perfect event: teambuilding worked exceptionally well, the discussions were lively but always fair, everyone now has a good idea what the project really is about, [...]

Read More..>>

KIWI starts agile

We are at the kick-off meeting of KIWI project. KIWI started quite well, good atmosphere was established since the morning of the first day. I think it is especially because of the agile games to break the ice :) We have "puzzled" our own program split initially into pieces of puzzle (see the picture, the program is glued to the white board). This way we have actively learned what we can await in these 3 days. We have established a social network by throwing and catching a ball (see the second picture, the network is visible through established threads :) ). It was quite an interesting way to get to know each other. Whoever had a ball in his hands was supposed to talk about himself for a half minute. The first day was very interactive and full of discussions. Technology, methods, and use cases attracted a lot of input and exchange of ideas. I am looking forward to the other two days as well as rest of the project. The team seems good as well as project is interesting with a lot of fun.

Read More..>>

New FP7 project KIWI launched

New EU FP7 STREP project KIWI: Knowledge in a Wiki has started on March 1 2008. The project will research technological advances in sematic wikis and in particular reasoning, reason maintanace, personalization and user modelling, knowledege extraction and advanced editing facilities for wikis. The new wiki framework shall be tested in the context of software development and project management. Researchers, developers, and project managers from 7 instututes from Austria, Denmark, Germany and Czech republic will meet at KIWI project kick off meeting on March 12-15 2008 in Salzburg. Please visit KIWI project page for more information or contact Peter Dolog for more information.

Read More..>>